Compared to standard next token prediction training, we find transformers trained to predict multiple steps ahead (and back) learn to navigate mazes efficiently.
Standard next token training predict a single next step; MLM-U predicts multiple steps ahead and back
MLM-U is able to more accurately navigate mazes, including more complex mazes of grid sizes of
30x30.
We show full path navigation accuracy for new mazes (not seen during training) of various grid
sizes.
Both models are transformers with an identical number of parameters trained on the same data.
We find MLM-U training can outperform much larger (10x) next token models and even models trained with additional supervision from A* search traces.
We show example 10x10 mazes. On the left, is an example of an A* Search generated maze from Lehnert et al.. On the right, is an example of a Depth First Search generated maze, producing longer solution paths.
When next token prediction can also solve the maze, in the simple 5x5, MLM-U converges 2x faster in terms of GPU hours.