The Barnes-Hut Algorithm for LLMs
So currently the main issue with LLMs is that they are bound by the N^2 law, where N is the context size.
The problem appears to be very similar to simulating gravity.
Yet for gravity we can use the Barnes-Hut approximation, where far away entities are grouped together as a single entity.
Therefore the question: is it possible to adapt Barnes-Hut to the attention matrix of LLMs?
I think people are trying to do something similar (hierarchical approaches, summaries and tree of thought), but they still don't see the forest behind the trees. I.e. that the language modelling is basically a physics problem.
All these N^2 connectednes/pathfinding problems appear to be expressible as gravity over some distance function, which itself a sorting problem, which can be solved by a N*log(N) algorithm.