Musical Conversion and Recognition for Music Compression

Abstract
TAMPERE UNIVERSITY OF TECHNOLOGY
Department of Information Technology
Institute of Signal Processing
ERONEN, ANTTI: Automatic musical instrument recognition
Master of Science Thesis, 69 pages
Examiners: Prof. Jaakko Astola, MSc Anssi Klapuri
Funding: Tampere University of Technology, Institute of Signal Processing
October 2001
Keywords: Automatic musical instrument recognition, sound source recognition, timbre recog-
nition, audio content analysis, computational auditory scene analysis
This thesis concerns the automatic recognition of musical instruments, where the idea is to
build computer systems that “listen” to musical sounds and recognize which instrument is
playing. Experimental material consisted of 5286 single notes from Western orchestral instru-
ments, the timbre of which have been studied in great depth. The literature review part of this
thesis introduces the studies on the sound of musical instruments, as well as related knowledge
on instrument acoustics. Together with the state-of-the-art in automatic sound source recogni-
tion systems, these form the foundation for the most important part of this thesis: the extraction
of perceptually relevant features from acoustic musical signals.
Several different feature extraction algorithms were implemented and developed, and used as a
front-end for a pattern recognition system. The performance of the system was evaluated in
several experiments. Using feature vectors that included cepstral coefficients and features
relating to the type of excitation, brightness, modulations, asynchronity and fundamental
frequency of tones, an accuracy of 35 % was obtained on a database including several
examples of 29 instruments. The recognition of the family of the instrument between six
possible classes was successful in 77 % of the cases.
The performance of the system and the confusions it made were compared to the results
reported for human perception. The comparison shows that the performance of the system is
worse than that of humans in a similar task (46 % in individual instrument and 92 % in
instrument family recognition [Martin99]), although it is comparable to the performance of
other reported systems. Confusions of the system resemble those of human subjects, indicating
that the feature extraction algorithms have managed to capture perceptually relevant informa-
tion from the acoustic signals.

4 Likes