Nie jesteś zalogowany | Zaloguj się

Conditional Computation in Transformers

Prelegent(ci)
Sebastian Jaszczur
Afiliacja
Uniwerystet Warszawski
Termin
2 czerwca 2022 12:15
Pokój
p. 3140
Seminarium
Seminarium "Uczenie maszynowe"

Note: seminar will be in person with a follow-up lunch.

 

Transformer architecture is widely used in Natural Language Processing to get state of the art results. Unfortunately, such model quality is usually only possible by using extremely large models, which require significant resources during both training and prediction. This problem can be tackled by designing a neural network to conditionally skip a significant portion of model’s weights and computation, leaving only parts useful for the example at hand. I will describe some of the recent developments in this area, including my work published at NeurIPS 2021, “Sparse is Enough in Scaling Transformers”, and my ongoing research.