Transformers Can Navigate Mazes With Multi-Step Prediction

Compared to standard next token prediction training, we find transformers trained to predict multiple steps ahead (and back) learn to navigate mazes efficiently.

hero

Maze Navigation Performance

Standard next token training predict a single next step; MLM-U predicts multiple steps ahead and back

MLM-U is able to more accurately navigate mazes, including more complex mazes of grid sizes of 30x30.
We show full path navigation accuracy for new mazes (not seen during training) of various grid sizes. Both models are transformers with an identical number of parameters trained on the same data.

training

We find MLM-U training can outperform much larger (10x) next token models and even models trained with additional supervision from A* search traces.

training

Example Mazes

We show example 10x10 mazes. On the left, is an example of an A* Search generated maze from Lehnert et al.. On the right, is an example of a Depth First Search generated maze, producing longer solution paths.

examples

Training Efficiency

When next token prediction can also solve the maze, in the simple 5x5, MLM-U converges 2x faster in terms of GPU hours.