Andrei Manolache, Luiz F.O. Chamon, Mathias Niepert
A recurring question in machine learning is whether symmetries present in a task or dataset should be explicitly hard-coded into a model.
Even when the underlying data is perfectly equivariant, a sufficiently flexible model is not guaranteed to learn an equivariant solution. Conversely, enforcing equivariance through architectural constraints can complicate optimisation, and in many practical settings equivariance holds only approximately in the dataset, due to measurement noise or other imperfections.
This paper proposes a principled framework for controlling the degree of equivariance enforced in a model. The key idea is to embed an equivariant architecture within a broader multi-parameter family of models, where equivariance is recovered when these “equivariance-breaking” parameters vanish.
By examining the dual of the training constrained optimisation problem, the authors derive a minimax formulation that allows joint optimisation of the model and equivariance-breaking parameters (via gradient descent) and dual variables (via gradient ascent). This reformulation provides a mechanism through which the optimisation process itself determines the appropriate degree of equivariance.
When the dataset is fully equivariant, the authors show that the equivariance-breaking parameters naturally converge to zero during training, leading the model to recover an equivariant solution without manual enforcement. In situations where the data only approximately respects a symmetry, the framework is extended by relaxing the constraints through slack variables that bound the equivariance-breaking parameters. In this partially equivariant regime, the parameters do not vanish, but the authors derive theoretical guarantees relating their magnitude to the model’s deviation from perfect equivariance.
Finally, the paper presents empirical results demonstrating that this gradual and data-driven incorporation of equivariance does not degrade performance on downstream tasks. Moreover, the training procedure remains stable even when the input symmetry is intentionally degraded, supporting the claim that the method adapts appropriately to imperfect real-world symmetries.
What I particularly enjoyed about this paper is its introduction of a technique I had not encountered before: framing the incorporation of equivariance as a minimax optimisation derived from the dual of a constrained loss. Enforcing equivariance by adding auxiliary loss terms is challenging, as it inevitably introduces a delicate weight-tuning problem.
This paper’s approach bypasses that issue entirely by integrating the constraint in a principled, optimisation-driven manner. I find this both elegant and practically appealing, and I’m eager to investigate how these ideas might inform my own research.
Learning (Approximately) Equivariant Networks via Constrained Optimisation