David Huron
America Association for the Advancement of Science, 167th Program, (2001) p. A44.
Session on Understanding Music With Statistical Models.
The availability of large symbolic musical databases has provided unprecedented opportunities for music-related research. Some of these databases have been created by scholarly organizations such as the Center for Computer Assisted Research in the Humanities and the Repertoire International des Sources Musicales (RISM). Most symbolic databases have been assembled by amateurs using MIDI. More recently, proprietary databases have been assembled by "dot-com" companies collecting user-response data from the Internet.
Such databases greatly facilitate studies in musical stylistics, taste, musical similarity, performance analysis, and even forensic musicology. The databases are used in innumerable ways from "name-that-tune" searching, to automated music summarization, and for training neural networks.
While these databases offer important opportunities, they also raise difficult methodological challenges. Available materials exhibit widely differing quality, so estimating error rates is essential. The variety of data formats means that missing or incompatible information is commonplace. Non-random (opportunistic) samples present onerous statistical problems for researchers.
The opportunities and problems afforded by musical databases are discussed and illustrated via several contrasting applications, including the use of such data to generate geographical maps of musical cultures.
Presentation slides.