Symbolic regression

This module contains the implementation of the symbolic regression methods

Overview

This module implements symbolic regression technics on top of the qdisc representation. It includes helpers for the various objectifs (SR1/SR2/SR3), dataset preparation routines, and a SymbolicRegression training wrapper.

Ansätze

For the moment, only the two-body correlator ansatz (2BC) is implemented as TwoBodyModel.


source

TwoBodyModel


def TwoBodyModel(
    pairs, key, add_constant:bool=False
):

A simple two-body model where the output is a sum of pairwise interactions between input features. Each pairwise interaction is parameterized by a learnable coefficient (alpha). The model can optionally include a constant term.


source

TwoBodyModel.predict


def predict(
    X
):

Predict the output of the model given input X.

Losses

Bellow are implemented various functions used to train the SR modules following the various objectifs.


source

loss_SR3


def loss_SR3(
    alpha, model, dataset, vk, G, L1_reg, options
):

loss used for 2BC with SR3


source

derivative_loss_alpha_multi_vk


def derivative_loss_alpha_multi_vk(
    tree, X, options, vk, G
):

loss projecting and aligning the gradiant Args: dy: (S, N) gradients of tree w.r.t inputs vk: (S, N, m) projector vectors per sample (m projectors) G: (S, m) target projected gradients per sample Returns MSE between normalized projected dy and normalized G (averaged over m).


source

loss_SR2


def loss_SR2(
    alpha, model, dataset, G, L1_reg, options
):

loss used for the 2BC witb SR2


source

derivative_loss_alpha_multi


def derivative_loss_alpha_multi(
    tree, X, options, G
):

MSE loss aligning the gradiant


source

loss_SR1


def loss_SR1(
    alpha, model, X, Y, L1_reg, options
):

loss used for the 2BC with SR1


source

class_loss


def class_loss(
    tree, X, options, Y
):

classification loss used for the 2BC witb SR1


source

FFNN_theta_to_mu


def FFNN_theta_to_mu(
    hidden_dim:int, num_layers:int, parent:Union=<flax.linen.module._Sentinel object at 0x7f74545e5280>,
    name:Optional=None
)->None:

simple feed forward net from theta to mu1, used for SR3

Performance helpers

Bellow are implemented various functions helping to acces the performance/quality of the discovered expressions.


source

auc_from_scores_labels


def auc_from_scores_labels(
    scores, labels
):

Compute AUC from scores and binary labels.


source

compare_theta_corr


def compare_theta_corr(
    theta, X, C, method:str='pearson', use_upper:bool=True
):

Compare the learned theta matrix to the empirical correlation matrix C using the specified method (pearson, cosine, or spearman). If use_upper=True, only compare the upper triangular parts of the matrices.


source

spearman_rho


def spearman_rho(
    a, b
):

Compute Spearman’s rank correlation coefficient between two vectors a and b.


source

cosine_sim


def cosine_sim(
    a, b, eps:float=1e-12
):

Compute cosine similarity between two vectors a and b.


source

pearson_between_vectors


def pearson_between_vectors(
    a, b, eps:float=1e-12
):

Compute Pearson correlation between two vectors a and b.


source

flatten_upper


def flatten_upper(
    mat
):

Flatten the upper triangular part of a matrix.


source

empirical_corr_matrix


def empirical_corr_matrix(
    X, centered:bool=True
):

Compute empirical correlation matrix from data X. If centered=True, center the data by subtracting the mean before computing correlations.


source

curved_edge


def curved_edge(
    ax, x1, y1, x2, y2, curvature:float=0.2, plot_kwargs:VAR_KEYWORD
):

Draw a curved (quadratic Bézier) edge between (x1,y1) and (x2,y2). curvature > 0 bends left, curvature < 0 bends right. Used to visualize pairwise interactions in the 2BC.

SR class

The main entry point of SR is the SymbolicRegression class, which exposes the training and analysis workflow for symbolic regression on qdisc data.


source

SymbolicRegression


def SymbolicRegression(
    dataset:Dataset, cluster_idx_in:Array, objective:str, type_of_vk:Optional=None,
    cluster_idx_out:Optional=None, # only needed for SR1 if not specified, full/in
    search_space:str='2_body_correlator', # ansatz or genetic
    add_constant:bool=False, shift_data:bool=True, VAE_model:Optional=None, # needed for SR2,3
    VAE_params:Optional=None, # needed for SR2,3
    mu_cluster:Optional=None, # needed for SR2,3
    idx_mu_cluster:Optional=None, # needed for SR2,3
):

Wrapper with the SR methods to be used on top of the representation learned by the cpVAE for quantum

Args:

dataset: Dataset object
cluster_idx_in: coord. specifying the location of the cluster we analyse in parameter (theta) space
objective: SR1, SR2 or SR3
cluster_idx_out: coord. specifying the location of the cluster we analyse in parameter (theta) space (only used in SR1)
search_space: 2_body_correlator or genetic
add_constant: if True, add a constant term to the model
shift_data: if the symbolic function takes direcly the dataset.data or if {0,1}->{-1,1} before

for SR2,3 also need:

VAE_model: the VAE model
VAE_params: its params
mu_cluster: the value of the latent variable accrooss theta space where the cluster appear (for now, only one mu)
idx_mu_cluster: index of the latent variable where the cluster appear (for now, only one)

source

SymbolicRegression.train


def train(
    key:int, dataset_size:int=2000, kwargs:VAR_KEYWORD
):

redirect to the train wrt the chosen search space


source

SymbolicRegression.call_pysr


def call_pysr(
    key:PRNGKey, dataset_size:int, random_state:int=2575, # seed for reproductibility
    niterations:int=200, # Number of iterations to search
    binary_operators:list=['+', '*', '-'], # Allowed binary operations
    unary_operators:list=[], # Other allowed operations
    elementwise_loss:str='loss(x,y) = -y*log(1/(1+exp(-x)))-(1-y)*log(1-1/(1+exp(-x)))', # sigmoid loss for SR1
    maxsize:int=20, # max complexity of the equations
    progress:bool=True, # Show progress during training
    extra_sympy_mappings:dict={'C': 'C'}, # Allow PySR to use constants
    batching:bool=True, # batching, usually big dataset
    batch_size:int=500, turbo:bool=True, deterministic:bool=True, # for reproductibility
    parallelism:str='serial'
):

call pysr with the SR objective


source

SymbolicRegression.train_2BC


def train_2BC(
    key:PRNGKey, dataset_size:int=2000, L1_reg:float=0.0, print_info:bool=True, max_iter:int=500
)->object:

Train the 2 body correlator (2BC) ansatz on the various SR objectives


source

SymbolicRegression.prepare_dataset


def prepare_dataset(
    key:PRNGKey, dataset_size:Optional=2000
)->tuple:

Prepare the dataset, redirect to a method depending on the objective


source

SymbolicRegression.plot_alpha


def plot_alpha(
    topology:list, edge_scale:int=10, name:str='', threshold:float=None
):

plot the 2 body correlator weights alpha_ij


source

SymbolicRegression.compute_prediction


def compute_prediction(
    theta_pair:tuple=(1, 0), values_other_thetas:tuple=()
)->Array:

compute f(x) on the parameter space


source

SymbolicRegression.compute_and_plot_prediction


def compute_and_plot_prediction(
    theta_pair:tuple=(1, 0), values_other_thetas:tuple=(), name:str='', class_pred:bool=False,
    fig_shape:tuple=(3, 3)
)->Array:

compute and plot f(x) on the parameter space


source

SymbolicRegression.reduce_alpha


def reduce_alpha(
    random_state:int, niterations:int=200, binary_operators:list=['+', '*', '/', '-'],
    unary_operators:list=['exp', 'log', 'sin', 'cos', 'tanh'], elementwise_loss:str='loss(x, y) = (x - y)^2',
    maxsize:int=25, deterministic:bool=True, extra_sympy_mappings:dict={'C': 'C'}
)->str:

Use pysr to reduce the alpha. It tries to find a fct: g(i,j)->alpha_ij