Mid-level features for audio chord recognition using a deep neural network
Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki, Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki, Tome 155 (2013) no. 4, pp. 109-117 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice du chapitre de livre

Deep neural networks composed of several pre-trained layers have been successfully applied to various tasks related to audio processing. Some configurations of deep neural networks (including deep recurrent networks) which can be pretrained with the help of stacked denoising autoencoders are proposed and examined in this paper in application to feature extraction for audio chord recognition task. The features obtained from an audio spectrogram using such network can be used instead of conventional chroma features to recognize the actual chords in the audio recording. Chord recognition quality that was achieved using the proposed features is compared to the one that was achieved using conventional chroma features which do not rely on any machine learning technique.
Keywords: audio chord recognition, recurrent network, deep learning.
Mots-clés : autoencoder
@article{UZKU_2013_155_4_a10,
     author = {N. Glazyrin},
     title = {Mid-level features for audio chord recognition using a~deep neural network},
     journal = {U\v{c}\"enye zapiski Kazanskogo universiteta. Seri\^a Fiziko-matemati\v{c}eskie nauki},
     pages = {109--117},
     year = {2013},
     volume = {155},
     number = {4},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/UZKU_2013_155_4_a10/}
}
TY  - JOUR
AU  - N. Glazyrin
TI  - Mid-level features for audio chord recognition using a deep neural network
JO  - Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki
PY  - 2013
SP  - 109
EP  - 117
VL  - 155
IS  - 4
UR  - http://geodesic.mathdoc.fr/item/UZKU_2013_155_4_a10/
LA  - en
ID  - UZKU_2013_155_4_a10
ER  - 
%0 Journal Article
%A N. Glazyrin
%T Mid-level features for audio chord recognition using a deep neural network
%J Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki
%D 2013
%P 109-117
%V 155
%N 4
%U http://geodesic.mathdoc.fr/item/UZKU_2013_155_4_a10/
%G en
%F UZKU_2013_155_4_a10
N. Glazyrin. Mid-level features for audio chord recognition using a deep neural network. Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki, Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki, Tome 155 (2013) no. 4, pp. 109-117. http://geodesic.mathdoc.fr/item/UZKU_2013_155_4_a10/

[1] Brown J. C., “Calculation of a constant $Q$ spectral transform”, J. Acoust. Soc. Am., 89:1 (1991), 425–434 | DOI

[2] Fujishima T., “Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music”, Proc. Int. Computer Music Conf. (ICMC), 1999, 464–467

[3] Müller M., Ewert S., Kreuzer S., “Making chroma features more robust to timbre changes”, ICASSP'09, Proc. 2009 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, IEEE Comput. Soc., Washington, DC, USA, 2009, 1877–1880

[4] Khadkevich M., Omologo M., “Use of Hidden Markov Models and Factored Language Models for Automatic Chord Recognition”, Proc. 10th Int. Soc. for Music Information Retrieval Conference (Kobe, Japan, October 26–30, 2009), 2009, 561–566

[5] Mauch M., Automatic Chord Transcription from Audio Using Computational Models of Musical Context, PhD Thesis, University of London, London, 2010, 168 pp.

[6] Oudre L., Grenier Y., Févotte C., “Template-Based Chord Recognition: Influence of the Chord Types”, Proc. 10th Int. Soc. for Music Information Retrieval Conference (Kobe, Japan, October 26–30, 2009), 2009, 153–158

[7] Humphrey E. J., Cho T., Bello J. P., “Learning a robust Tonnetz-space transform for automatic chord recognition”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-12), 2012, 453–456

[8] Maas A. L., Le Q. L., O'Neil T. M., Vinyals O., Nguyen P., Ng A. Y., “Recurrent Neural Networks for Noise Reduction in Robust ASR”, INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association (Portland, Oregon, USA, September 9–13, 2012), 2012 URL: http://www1.icsi.berkeley.edu/~vinyals/Files/rnn_denoise_2012.pdf

[9] Glazyrin N. Yu., “Use of autoencoders for recognition of chord sequences in digital sound recordings”, Proc. All-Russ. Sci. Conf. “Analysis of Images, Social Networks, and Texts”, AIST 2013 (Ekaterinburg, April 4–6, 2013), INTUIT National Open University, Moscow, 2013, 211–215 (In Russian)

[10] Vincent P., Larochelle H., Lajoie I., Bengio Y., Manzagol P.-A., “Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion”, J. Mach. Learn. Res., 11 (2010), 3371–3408 | MR | Zbl

[11] Ng A., CS294A Lecture Notes. Sparse autoencoder, URL: http://www.stanford.edu/class/cs294a/sparseAutoencoder.pdf

[12] Bengio Y., Courville A. C., Vincent P., Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives, 2012, arXiv: 1206.5538

[13] Elman J. L., “Finding structure in time”, Cognitive Sci., 36:2 (1990), 179–211 | DOI

[14] Glazyrin N. Yu., “Towards the task of audio chord esitmation”, Izv. Irkutsk. Gos. Univ. Ser. Matem., 6:2 (2013), 2–17 (In Russian)

[15] Mauch M., Cannam C., Davies M., Dixon S., Harte C., Kolozali S., Tidhar D., Sandler M., “OMRAS2 Metadata Project 2009”, Late-breaking session at the 10th Int. Conf. on Music Information Retrieval, Kobe, Japan, 2009 URL: http://matthiasmauch.de/_pdf/mauch_omp_2009.pdf

[16] Goto M., Hashiguchi H., Nishimura T., Oka R., “RWC Music Database: Popular, Classical, and Jazz Music Databases”, Proc. 3rd Int. Conf. on Music Information Retrieval (ISMIR 2002), 2002, 287–288

[17] Pauwels J., Peeters G., “Evaluating automatically estimated chord sequences”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-13), 2013, 749–753

[18] Downie S. J., “The music information retrieval evaluation exchange (2005–2007): A window into music information retrieval research”, Acoust. Sci. Tech., 29:4 (2008), 247–255 | DOI

[19] R Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, 2012

[20] Hothorn T., Hornik K., Wiel M. A. van de Zeileis A., “Implementing a Class of Permutation Tests: The Coin Package”, J. Stat. Software, 28:8 (2008), 1–23