Towards Generative Music
- Speaker(s)
- Jan Ludziejewski
- Affiliation
- Uniwerystet Warszawski
- Date
- Dec. 16, 2021, 12:15 p.m.
- Information about the event
- meet.google.com/ooi-zxye-dxa
- Seminar
- Seminarium "Machine Learning"
The OpenAI Jukebox was a groundbreaking model in sound generation and is still considered to be the state-of-the-art in the music modeling task. It consists of two separate networks, Vector Quantization Variational Autoencoder, which strongly compresses the raw waveform into a series of discrete tokens, and a Transformer that performs the modeling task in VQ-VAE's internal language. While this model was the first to be able to generate coherent tracks in the chosen style, it has several major drawbacks, namely extremely slow generation time (540 times slower than real time) and poor sound quality, especially at high frequencies. Moreover, because of the lack of consistency, the popular published results had to be cherry-picked, post-processed and generally created in a human-guided manner. Modern approach to fast transformers (i.e. Performer) can become an answer to this slow performance, leading to an online evaluation time and adversarial training can help with its performance issues. This generative model can be also adapted to other tasks such as score transcription, sound separation and score tracking.