Dynamical Systems

Attention as Energy Minimization: Visualizing Energy Landscapes

Can we swap softmax attention for energy-based attention?