Using Large Scale Imitation Learning for MAPF

From a diverse sets of map, we can use existing strong (centralized) heuristic search solvers to solve instances with hundreds of agents. This allows us to easily collect thousands of MAPF solutions. Each solution contains a sequence of timesteps, where each timestep contains 10s-100s of agents with their next action label.
We can then follow existing work and use a Graph Neural Network (GNN) structure where each agent is a "vertex", and agents that are within their local vield of view (FoV) and can communicate with each other are "edges". The GNN is trained to predict the action label of each agent.