Mathematics of Transformers

Name: Mathematics of Transformers
Start: 2025-09-26T08:30:00+02:00
End: 2025-09-26T18:00:00+02:00
Location: DESY

26 September 2025

DESY

Europe/Berlin timezone

Mean-Field Transformer Dynamics with Gaussian Inputs

26 Sept 2025, 11:15

45m

Building 1b, Seminar Room 4ab (DESY)

Building 1b, Seminar Room 4ab

DESY

Notkestraße 85 22607 Hamburg Germany

Valérie Castin

Transformers, that underlie the recent successes of large language models, represent the data as sequences of vectors called tokens. This representation is leveraged by the attention function, which learns dependencies between tokens and is key to the success of Transformers. However, the dynamics induced by the iterative application of attention across layers remain to be fully understood. To analyze these dynamics, we identify each input sequence with a probability measure, thus handling input sequences of arbitrary length, and model its evolution as a Vlasov equation called Transformer PDE, whose velocity field is non-linear in the probability measure. For compactly supported initial data and several self-attention variants, we show the Transformer PDE is well-posed and is the mean-field limit of an interacting particle system. We also study the case of Gaussian initial data, which has the nice property of staying Gaussian across the dynamics. This allows us to identify typical behaviors theoretically and numerically, and to highlight a clustering phenomenon that parallels previous results in the discrete case.

There are no materials yet.

Mathematics of Transformers

Mean-Field Transformer Dynamics with Gaussian Inputs

Building 1b, Seminar Room 4ab

DESY

Speaker

Description

Presentation materials

Choose timezone

Mathematics of Transformers

Speaker

Description

Presentation materials