Transformers Are Secretly Collectives of Spin Systems
A statistical mechanics perspective on transformers
A statistical mechanics perspective on transformers
How far can we push the idea of transformers as physical systems?
Can we model attention as the collective response of a statistical-mechanical system?