A non-equilibrium statistical mechanics perspective on transformers

A statistical mechanics perspective on transformers

How far can we push the idea of transformers as physical systems?

Can we model attention as the collective response of a statistical-mechanical system?

Can we swap softmax attention for energy-based attention?

Where does the energy function behind Transformers' attention mechanism come from?

Can an energy-based perspective shed light on training and improving Transformer models?

Matthias Bal © 2023

Published with Wowchemy — the free, open source website builder that empowers creators.