Nie jesteś zalogowany | Zaloguj się

Sparsity in Efficient Transformers

Prelegent(ci)
Sebastian Jaszczur
Termin
17 czerwca 2021 12:15
Informacje na temat wydarzenia
google meet (meet.google.com/yew-oubf-ngi)
Seminarium
Seminarium "Uczenie maszynowe"

Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach. We address this problem by leveraging sparsity. We study sparse variants for all layers in the Transformer and propose a new variant of the Transformer, which uses sparse layers to decode much faster than the standard Transformer when scaling the model size. Surprisingly - the sparse layers are enough to obtain the same model quality as the standard Transformer.