NeurIPS 2022: Paper review #1
G-Research were headline sponsors at NeurIPS 2022, in New Orleans.
ML is a fast-evolving discipline; attending conferences like NeurIPS and keeping up-to-date with the latest developments is key to the success of our quantitative researchers and machine learning engineers.
Our NeurIPS 2022 paper review series gives you the opportunity to hear about the research and papers that our quants and ML engineers found most interesting from the conference.
Here, Sebastian L, Quantitative Researcher at G-Research, discusses two papers from NeurIPS:
- Focal Modulation Networks
- Reconstructing Training Data from Trained Neural Networks
Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao
This paper proposes a new general-purpose image processing architecture, which combines ideas from image transformers and convolutional architectures into a new module the authors name focal modulation.
At a high level, the authors argue that a key idea of the paper is to change the order of context aggregation, and global to local feature interaction, compared to self-attention architectures.
Focal modulation performs context aggregation first, following the authors’ aims of building a more compute-efficient architecture.
Hierarchical context aggregation is performed by an independent subnetwork. This consists of stacked depth-wise convolutions, the results of which are gated and summed into a modulation feature map. This map interacts with the query feature map in a pointwise fashion.
The paper shows state-of-the-art results of the new focal attention module, compared with modern self-attention-based vision architectures on classification and segmentation tasks, including ImageNet 1k and 22k, as well as on the COCO segmentation challenge.
The authors demonstrate on multiple examples that the modulation feature map naturally learns to attend to semantically meaningful image regions, such as the foreground or objects of interest.
Niv Haim, Gal Vardi, Gilad Yehudai, Ohad Shamir, Michal Irani
This paper demonstrates that it is possible to reconstruct a large portion of the input data sets from the weights of a trained network on popular vision data sets, such as CIFAR.
The images reconstructed from the trained network often match actual data set samples well and, in terms of pixels, reconstructions are impressively precise. The algorithm for reconstruction is based on a theoretical insight for homogeneous neural networks, which characterises the learned network weights as the critical point of a constraint optimisation problem.
This insight was first introduced in the deep learning literature in the study of the implicit bias of gradient flows. The authors observe that this result allows writing the trained weights of the network as a linear combination of the gradients, with respect to the weights at each training data point in the data set.
They use this to set up an appropriate minimisation problem to reconstruct all critical training samples simultaneously. The assumption that the network will be homogenous is satisfied by ReLU networks without bias terms, for example, which are one-homogenous.
This paper shows a surprising result based on a theoretical insight into the training dynamics of neural networks. The result has potential implications for the discussion on privacy preservation in neural networks, demonstrating it is possible to extract actual training samples from the weights of trained neural networks.