AI Is Scaling Biases That the Music Industry Has Already Built (Guest Column)
Categoria: Musica
Long-embedded music business inequities are being baked into AI training data that will shape what gets heard for years to come.
Por Billboard | 03/06/2026
Africa, the Middle East and South Asia represent roughly half the world’s population, and they are also home to hundreds of distinct musical traditions. But in the training datasets most commonly used to build music AI models, music from Africa accounts for only 0.3%, the Middle East: 0.4%, and South Asia 0.9% — whereas Western genres make up 94%. These numbers come from researchers at Abu Dhabi’s Mohamed bin Zayed University of Artificial Intelligence, who surveyed the training datasets behind today’s generative music tools and presented the findings at the 2025 Nations of the Americas Chapter of the Association for Computational Linguistics (NAACAL). When those models tried generating music in the tradition of an Indian raga, they defaulted to a sitar playing Western tonal structures, producing something that sounded Western with an Indian instrument on top. The same study tested Turkish Makam, a melodic system built on intervals that don’t exist on a Western piano. Once again, the models flattened those intervals into standard Western pitch. When the researchers fed the model additional Hindustani Classical and Turkish Makam recordings to correct the bias, its creative output actually got worse. The Western training data was too dominant to override. This study confirms that the problem goes deeper than underrepresentation, with the biases embedded in decades of music data now being built into AI systems trained on that data. And it’s these systems that will shape what gets heard, paid and promoted for years to come. Related A Short History of AI-Generated Music: From ‘Fake Drake’ to Blockbuster Legal Settlements How Access Opera's John Burton Kickstarted Ye's Touring Comeback: 'He's Like the Michael Jordan of This Thing' Clive Davis Hospitalized in New York After Respiratory Issue The datasets powering the music industry have been shaped over decades by who’s gotten signed, which markets have been considered worthy of tracking, and which genres received investment. Industry infrastructure is built around particular slices of the business, and it’s treated as though it represents the whole thing. For a long time, those gaps sat quietly in back-office databases, and the consequences were slow. Now those gaps sit in the training data, and the systems built on top of them will run for the foreseeable future. The bias extends to gender. In 2025, women represented 14.5% of songwriters on the Billboard Hot 100 and only 4.4% of producers, according to the USC Annenberg Inclusion Initiative , which has been tracking these numbers for over a decade. Those figures have barely shifted since 2012. Algorithms are learning from a foundation that doesn’t reflect what people want to hear, but rather reflects what’s already popular, promoted and playlisted at scale. Those outputs are then fed straight back into the loop. A song added to Spotify’s Today’s Top Hits generates mill