Facing the next great musical frontier, Taehyun Kim, Chief Strategy Officer at POZAlabs tells us all about the possibilities and challenges ahead for AI and music and what it all means for people who aren't traditionally adept at music composition...
If you could solve any global problem in the world with AI, what would it be and why?
I aim to overcome the hurdles related to music composition as a task. Among other fields of art, music composition is perhaps one of the most difficult areas for the general public to access. Through the symbolic music generation AI model we developed, our goal is to create a world where anyone can engage in music composition as a leisure activity regardless of musical background.
What do you think are the 3 most important things for businesses in relation to AI at the moment?
1. There’s a greater need for awareness around copyright issues related to training data in an industry setting compared to academia. POZAlabs is fundamentally protected from such claims as our team of in-house professional composers constantly create original music data - which accounts for over 700,000 music samples up to date - to train our model.
2. I think there are limitations to the current state of end-to-end models. Users should be able to interact with intermediately interpretable models and also allow them to easily and accurately apply and edit prompts.
3. In a real-world business setting, it is difficult to gather big data just as the field of ‘symbolic music generation’ we are trying to solve. In this situation, since constructing foundation models through self-supervised learning is unviable, a different strategic approach should be adopted. To overcome such challenges, we modularized every step by adopting mechanisms that human composers make, utilized small bricks of data and applied segmented deep learning models.
What will take AI capabilities to the next level?
We are already seeing the rise of outstanding models in the field of generative AI, such as Dall-E and Stable Diffusion in computer vision and ChatGPT and Bard in Natural Language Processing. They were feasible due to the distinctiveness of the fields of CV and NLP - they can rack up enormous amounts of image and natural language data through the internet.
With the gathered data, it is possible to build a foundation model using self-supervised learning. However, it’s another story for fields like symbolic music generation where it is impossible to build a model by gathering data in such a way.
We see this solvable through ‘divide and conquer’ and ‘objective breakdown of music’ strategies. The ‘divide and conquer’ strategy allows us to create a pipeline of music generation through multiple models with a specific task — rather than having a single model do all the work. Furthermore, the success of objective breakdown of music lies in creating a quality dataset that is well-tagged, easy to understand for a computer, and objective.
Why did you choose to present at the WSAI Series this year?
AI is becoming more accessible to the general public than ever and it is important to share perspectives about where the development of AI is headed. I’d also like to share our company’s vision to help create a healthier music industry landscape and how our technology can catalyze the change.
What are you most excited about taking part in the WSAI Series?
Hearing different stories and applications around AI. Definitely not a common opportunity to meet AI experts around the world and share inspiration.
Team World Summit AI
Global AI events calendar 2023
World Summit AI Americas
19-20 April 2023
World Summit AI
11-12 October 2023
Share your content with the World Summit AI community
Got some interesting content you want to share with our community of over 220,000 AI Brains? You can send us anything from a published piece you have written online, white paper, article or interview. Submit it here