🎶Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: A Survey 🧠
About a decade ago, the music and audio tech community began borrowing heavily from Computer Vision, with models like VGGish and other convolutional architectures applied to spectrograms and piano roll representations.
Fast forward to today, and the influence has clearly shifted to Natural Language Processing — inspiring not just architectures like Music Transformer, but also prompt-based generative models like our lab’s own Mustango and Text2Midi.
To further inspire researchers and practitioners in this space, I’m excited to highlight a comprehensive survey led by Dinh-Viet-Toan Le (PhD student, University of Lille 1 Sciences and Technology) who was a visiting researcher at the AMAAI at Singapore University of Technology and Design,
📝 Published in ACM Computing Surveys, the paper explores how NLP methods are being adapted for symbolic music generation and music information retrieval. You’ll find detailed overview of tokenization methods, model architectures and more. It’s an essential read if you’re working on generative music models or symbolic MIR.
📖 Read the paper (or access preprint)
📂 GitHub overview
👥 By Dinh-Viet-Toan Le, Louis Bigo, Mikaela Keller, and Dorien Herremans
If you’re working in music AI, generative models, or symbolic MIR, we hope this paper becomes a valuable resource.