Sparsity in Efficient Transformers
- Speaker(s)
- Sebastian Jaszczur
- Date
- June 17, 2021, 12:15 p.m.
- Information about the event
- google meet (meet.google.com/yew-oubf-ngi)
- Seminar
- Seminarium "Machine Learning"
Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach. We address this problem by leveraging sparsity. We study sparse variants for all layers in the Transformer and propose a new variant of the Transformer, which uses sparse layers to decode much faster than the standard Transformer when scaling the model size. Surprisingly - the sparse layers are enough to obtain the same model quality as the standard Transformer.