Nie jesteś zalogowany | Zaloguj się

Towards Generative Music

Prelegent(ci)
Jan Ludziejewski
Afiliacja
Uniwerystet Warszawski
Termin
16 grudnia 2021 12:15
Informacje na temat wydarzenia
meet.google.com/ooi-zxye-dxa
Seminarium
Seminarium "Uczenie maszynowe"

The OpenAI Jukebox was a groundbreaking model in sound generation and is still considered to be the state-of-the-art in the music modeling task. It consists of two separate networks, Vector Quantization Variational Autoencoder, which strongly compresses the raw waveform into a series of discrete tokens, and a Transformer that performs the modeling task in VQ-VAE's internal language. While this model was the first to be able to generate coherent tracks in the chosen style, it has several major drawbacks, namely extremely slow generation time (540 times slower than real time) and poor sound quality, especially at high frequencies. Moreover, because of the lack of consistency, the popular published results had to be cherry-picked, post-processed and generally created in a human-guided manner. Modern approach to fast transformers (i.e. Performer) can become an answer to this slow performance, leading to an online evaluation time and adversarial training can help with its performance issues. This generative model can be also adapted to other tasks such as score transcription, sound separation and score tracking.