03 - Generalizing Anchor Regression to Multivariate Algorithms

We here summarise the work of Durand et al. (2025). We recommend reading the original paper for a more complete overview of the ideas sketched here.

Introduction

We continue working within the worst-case risk framework, but now consider a general loss function $\mathcal{L}(X, Y; \theta)$, whose form will be clarified later. The goal becomes solving:

$$ \arg\min_{\theta} \sup_{Q \in \mathbb{Q}} \mathbb{E}[\mathcal{L}(X, Y; \theta)]. $$

Assume the observed variables $(X, Y)$ follow the linear SCM from Rothenhäusler et al. (2021):

$$ \begin{pmatrix} X \ Y \ H \end{pmatrix} = \mathbf{B} \begin{pmatrix} X \ Y \ H \end{pmatrix} + \varepsilon + \mathbf{M} A. $$

Given acyclicity of the graph, we have:

$$ \begin{pmatrix} X \ Y \ H \end{pmatrix} = (I - \mathbf{B})^{-1}(\varepsilon + \mathbf{M} A), $$

or more simply, for some matrix $\mathbf{D}$,

$$ \begin{pmatrix} X \ Y \end{pmatrix} = \mathbf{D}(\varepsilon + \mathbf{M} A). $$

Bounding Intervention Covariance

From this, we can write the covariance matrix of $\begin{pmatrix} X \ Y \end{pmatrix}$ as:

$$ \Sigma_{XY} = \mathbf{D}\Sigma_{\varepsilon}\mathbf{D}^\top + \mathbf{D}\mathbf{M}\Sigma_{A}\mathbf{M}^\top \mathbf{D}^\top, $$

assuming $\varepsilon$ and $A$ are independent. Similarly, the covariance under intervention becomes:

$$ \Sigma_{XY}^{do(A := \nu)} = \mathbf{D}\Sigma_{\varepsilon}\mathbf{D}^\top + \mathbf{D}\mathbf{M}\Sigma_{\nu}\mathbf{M}^\top \mathbf{D}^\top. $$

With the set of bounded interventions:

$$ \mathbb{Q}^{\text{anchor}} = { P^\nu \mid \nu \nu^\top \preceq \gamma , \mathbb{E}_P[AA^\top] }, $$

we obtain for all $P \in \mathbb{Q}^{\text{anchor}}$:

$$ \Sigma_{XY}^{do(A := \nu)} \preceq \mathbf{D}\Sigma_{\varepsilon}\mathbf{D}^\top + \gamma \mathbf{D}\mathbf{M} \Sigma_{A} \mathbf{M}^\top \mathbf{D}^\top. $$

Hence, we can bound $\Sigma_{XY}$ across this intervention family, which is valuable since many methods depend on $\Sigma_{XY}$.

Anchor-Compatible Losses

We define a class of losses suitable for anchor regularisation.

Definition (Anchor-compatible loss):
A loss function $\mathcal{L}(X, Y; \theta)$ is anchor-compatible if it can be written as:

$$ \mathcal{L}(X, Y; \theta) = f_{\theta}(C_{XY}), $$

where $f_\theta : \mathbb{R}^{d \times p} \to \mathbb{R}$ is a linear map, and $C_{XY} = \begin{pmatrix} X \ Y \end{pmatrix} \otimes \begin{pmatrix} X \ Y \end{pmatrix}$.

Out-of-Distribution Generalisation

For anchor-compatible losses, we have the following guarantee:

Let $(X, Y, H)$ follow the SCM above and let $\mathcal{L}(X, Y; \theta)$ be anchor-compatible. Then, for any $\theta$ and $\gamma > 0$:

$$ \sup_{P \in \mathbb{Q}^{\text{anchor}}} \mathbb{E}_P[\mathcal{L}(X, Y; \theta)] = f _\theta(\Sigma _{XY}) + (\gamma - 1) f _\theta(\Sigma _{XY|A}). $$

Robustness of Multivariate Analysis Algorithms

This framework enables robustness for popular multivariate methods such as Multiple Linear Regression (MLR), Orthogonal PLS (OPLS), Reduced Rank Regression (RRR), and Partial Least Squares (PLS). However, some algorithms like Canonical Correlation Analysis (CCA) are not compatible with anchor regularisation due to the lack of theoretical guarantees.

Method	Loss	Constraints	Compatible
MLR	$\| Y - \mathbf{W}^T X \|_F^2$	–	✔
OPLS	$\| Y - \mathbf{U} \mathbf{V}^T X \|_F^2$	$\mathbf{U}^T \mathbf{U} = \mathbf{I}$	✔
RRR	$\| Y - \mathbf{W} X \|_F^2$	$\text{rank}(\mathbf{W}) = \rho$	✔
PLS	$-\text{tr}( \mathbf{W_x}^T X^T Y \mathbf{W_y} )$	$\mathbf{W_x}^T \mathbf{W_x} = \mathbf{I}, \mathbf{W_y}^T \mathbf{W_y} = \mathbf{I}$	✔
CCA	$-\text{tr}( \mathbf{W_x}^T X^T Y \mathbf{W_y} )$	$\mathbf{W_x}^T C_X \mathbf{W_x} = \mathbf{I}, \mathbf{W_y}^T C_Y \mathbf{W_y} = \mathbf{I}$	✘

Discussion

We showed that anchor regularisation applies to a broader class of loss functions beyond least squares. Any loss expressible as a linear map of the joint covariance $\Sigma_{XY}$ can be anchor-regularised via a simple causal term. This extends OOD generalisation to many standard multivariate learning algorithms.