Linus Ericsson, Miguel Espinosa, Chenhongyi Yang, Antreas Antoniou, Amos Storkey, Shay B. Cohen, Steven McDonagh, Elliot J. Crowley
While the transformer architecture has dominated much of the machine learning landscape in recent years, new model architectures always hold the promise of improving accuracy or efficiency on a particular machine learning task.
This paper introduces a new architecture search space (known as einspace, by analogy to the einsum operation) over which to perform automated neural architecture search (NAS). The search space is defined by a parameterised probabilistic context-free grammar over a range of branching, aggregation, routing and computation operations.
While many NAS search spaces are either overly restrictive or require the search algorithm to reinvent basic operations from principles, einsum is flexible enough to encode many popular architectures including ResNets, transformers and MLP-Mixer while still retaining relatively high-level building blocks.
Using a simple evolutionary algorithm that mutates the best-performing architectures in a population according to the production rules of the probabilistic context-free grammar, einspace was competitive with many more complex NAS methods, and proved especially effective when used to mutate existing state-of-the-art architectures, improving these architectures on almost every task the authors tested.
It will be interesting to see how einspace performs when more sophisticated and efficient search algorithms are developed for it, and whether the search space can be further extended to include recurrent network architectures. If nothing else, the paper is worth looking at just for all of the pretty pictures of different networks’ einspace representations.
einspace: Searching for Neural Architectures from Fundamental Operations