You are not logged in | Log in

Sparsity in Efficient Transformers

Speaker(s)
Sebastian Jaszczur
Date
June 17, 2021, 12:15 p.m.
Information about the event
google meet (meet.google.com/yew-oubf-ngi)
Seminar
Seminarium "Machine Learning"

Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach. We address this problem by leveraging sparsity. We study sparse variants for all layers in the Transformer and propose a new variant of the Transformer, which uses sparse layers to decode much faster than the standard Transformer when scaling the model size. Surprisingly - the sparse layers are enough to obtain the same model quality as the standard Transformer.