Mathematics of Transformers

Name: Mathematics of Transformers
Start: 2025-09-26T08:30:00+02:00
End: 2025-09-26T18:00:00+02:00
Location: DESY

26 September 2025

DESY

Europe/Berlin timezone

Transformers as token-to-token function learners

26 Sept 2025, 14:45

45m

Building 1b, Seminar Room 4ab (DESY)

Building 1b, Seminar Room 4ab

DESY

Notkestraße 85 22607 Hamburg Germany

Subhabrata Dutta

The ambitious question of understanding a Transformer can be decomposed into understanding the functions it implements: the class of functions one can theoretically approximate using a Transformer, which subclass of them is learnable via gradient descent, which training data distribution is implicitly biased towards which set of functions, how they are implemented across the neural components of the model, and so on. In this talk, I will focus on Transformers implementing language functions. A primer on mechanistic interpretability will be given, followed by certain open problems in this area. Then, I will present an alternate view of the Transformer functions that can potentially solve many of the existing limitations: existence of multiple parallel computation paths, lack of robustness of autoencoder-based replacement models, and how to formalize causal models embedded in training.

There are no materials yet.

Mathematics of Transformers

Transformers as token-to-token function learners

Building 1b, Seminar Room 4ab

DESY

Speaker

Description

Presentation materials

Choose timezone

Mathematics of Transformers

Speaker

Description

Presentation materials