Energy-Based Models

Deep Implicit Attention: A Mean-Field Theory Perspective on Attention Mechanisms

Can we model attention as the collective response of a statistical-mechanical system?

Attention as Energy Minimization: Visualizing Energy Landscapes

Can we swap softmax attention for energy-based attention?

Transformer Attention as an Implicit Mixture of Effective Energy-Based Models

Where does the energy function behind Transformers' attention mechanism come from?

An Energy-Based Perspective on Attention Mechanisms in Transformers

Can an energy-based perspective shed light on training and improving Transformer models?