Sparsity in Efficient Transformers
- Prelegent(ci)
- Sebastian Jaszczur
- Termin
- 17 czerwca 2021 12:15
- Informacje na temat wydarzenia
- google meet (meet.google.com/yew-oubf-ngi)
- Seminarium
- Seminarium "Uczenie maszynowe"
Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach. We address this problem by leveraging sparsity. We study sparse variants for all layers in the Transformer and propose a new variant of the Transformer, which uses sparse layers to decode much faster than the standard Transformer when scaling the model size. Surprisingly - the sparse layers are enough to obtain the same model quality as the standard Transformer.