Abstract
Aiming at the essential feature of the time-continuity of birdsong in nature, this paper proposed a birdsong classification model composed of two feature channels, which combines the features of time domain and time frequency domain. In order to make better use of the features, we used the improved average threshold method to denoise the original time-domain waveform features to reduce the influence of noise features. The most suitable feature extractor and the best fusion method of these two features are discussed. In this paper, the 3D convolutional neural network (3DCNN) and 2D convolutional neural network (2DCNN) were respectively applied as feature extractors of log_mel spectrum and waveform images. Then the advanced feature, which was extracted from these two feature channels, was fused in the middle stage, and the output enhanced feature was used as the input of double gated recurrent unit (d-GRU) network. In the work, birdsongs of four species from Xeno-Canto were selected for testing. The results showed that these three methods had improved the classification effect: feature fusion method in time domain and time-frequency domain, weighted average threshold noise reduction method and the method of extracting birdsong features via different types of feature extractors. The method of this paper had achieved mean average precision (MAP) of 95.9% in the classification comparison experiments, which was an inspiring outcome.
Similar content being viewed by others
References
Abdoli S, Cardinal P, Koerich AL (2019) End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst Appl 136:252–263
Adavanne S, Drossos K, Çakir E, Virtanen T (2017) Stacked convolutional and recurrent neural networks for bird audio detection. In: 2017 25th European signal processing conference (EUSIPCO). IEEE, pp 1729–1733
Bae SH, Choi I, Kim NS (2016) Acoustic scene classification using parallel combination of LSTM and CNN. In: Proceedings of the detection and classification of acoustic scenes and events 2016 workshop (DCASE2016), pp 11–15
Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271
Bhatt G, Gupta A, Arora A, Raman B (2018) Acoustic features fusion using attentive multi-channel deep architecture. arXiv preprint arXiv:1811.00936
Bold N, Zhang C, Akashi T (2019) Cross-domain deep feature combination for bird species classification with audio-visual data. IEICE Trans Inf Syst 102(10):2033–2042
Briggs F, Lakshminarayanan B, Neal L, Fern XZ, Raich R, Hadley SJ, … Betts MG (2012) Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. The Journal of the Acoustical Society of America 131(6):4640–4650
Chen X, Zhou G, Chen A, Yi J, Zhang W, Hu Y (2020) Identification of tomato leaf diseases based on combination of ABCK-BWTR and B-ARNet. Comput Electron Agric 178:105730
Chou CH, Lee CH, Ni HW (2007) Bird species recognition by comparing the HMMs of the syllables. In: Second international conference on innovative computing, Informatio and control (ICICIC 2007). IEEE, pp 143–143
Dennis JW (2014) Sound event recognition in unstructured environments using spectrogram image processing (Doctoral dissertation).
Fagerlund S (2007) Bird species recognition using support vector machines. EURASIP Journal on Advances in Signal Processing 2007(1):038637
Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 6645–6649
Grill T, Schlüter J (2017) Two convolutional neural networks for bird detection in audio signals. In: 2017 25th European signal processing conference (EUSIPCO). IEEE, pp 1764–1768
Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Silver D (2017). Rainbow: Combining improvements in deep reinforcement learning. arXiv preprint arXiv:1710.02298
Himawan I, Towsey M, Roe P (2018) 3D convolution recurrent neural networks for bird sound detection
Kahl S, Wilhelm-Stein T, Hussein H, Klinck H, Kowerko D, Ritter M, Eibl M (2017) Large-Scale Bird Sound Classification using Convolutional Neural Networks. In: CLEF (Working Notes)
Kim J, Lee Y, Kim D, Ko H (2020) Temporal attention based animal sound classification. The Journal of the Acoustical Society of Korea 39(5):406–413
Koh CY, Chang JY, Tai CL, Huang DY, Hsieh HH, Liu YW (2019) Bird sound classification using convolutional neural networks. In: CLEF (Working Notes)
Lasseck M (2013) Bird song classification in field recordings: winning solution for NIPS4B 2013 competition. In: Proc. of int. symp. Neural information scaled for bioacoustics, sabiod.org/nips4b, joint to NIPS, Nevada, pp 176–181
Lee CH, Chou CH, Han CC, Huang RZ (2006) Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis. Pattern Recognition Letters 27(2):93–101
Leng YR, Tran HD (2014) Multi-label bird classification using an ensemble classifier with simple features. In: Signal and information processing association annual summit and conference (APSIPA), 2014 Asia-Pacific. IEEE, pp 1–5
McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, Nieto O (2015) Librosa: audio and music signal analysis in python. In: Proceedings of the 14th python in science conference, vol 8, pp 18–25
McLoughlin I, Xie Z, Song Y, Phan H, Palaniappan R (2020) Time–frequency feature fusion for noise robust audio event classification. Circ Syst Signal Process 39(3):1672–1687
Müller L, Marti M (2018) Bird sound classification using a bidirectional LSTM. In: CLEF (Working Notes)
Nanni L, Maguolo G, Brahnam S, Paci M (2020) An ensemble of convolutional neural networks for audio classification. arXiv preprint arXiv:2007.07966
Piczak KJ (2016) Recognizing bird species in audio recordings using deep convolutional neural networks. In: CLEF (working notes), pp 534–543
Qiao Y, Qian K, Zhao Z (2020) Learning higher representations from bioacoustics: a sequence-to-sequence deep learning approach for bird sound classification. In: International conference on neural information processing. Springer, Cham, pp 130–138
Sainath TN, Vinyals O, Senior A, Sak H (2015) Convolutional, long short-term memory, fully connected deep neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4580–4584
Selin A, Turunen J, Tanttu JT (2006) Wavelets in recognition of bird sounds. EURASIP Journal on Advances in Signal Processing 2007:1–9
Shore J, Johnson R (1980) Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans Inf Theory 26(1):26–37
Stevens SS, Volkmann J, Newman EB (1937) A scale for the measurement of the psychological magnitude pitch. The Journal of the Acoustical Society of America 8(3):185–190
Takahashi N, Gygli M, Van Gool L (2017) Aenet: learning deep audio features for video analysis. IEEE Transactions on Multimedia 20(3):513–524
Tuncer T, Akbal E, Dogan S (2021) Multileveled ternary pattern and iterative ReliefF based bird sound classification. Appl Acoust 176:107866
Xie J, Zhu M (2019) Handcrafted features and late fusion with deep learning for bird sound classification. Ecol Inform 52:74–81
Xie, J. J., Ding, C. Q., Li, W. B., & Cai, C. H. (2018). Audio-only bird species automated identification method with limited training data based on multi-channel deep convolutional neural networks. arXiv preprint arXiv:1803.01107.
Zhang H, McLoughlin I, Song Y (2015) Robust sound event recognition using convolutional neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 559–563
Zhang X, Chen A, Zhou G, Zhang Z, Huang X, Qiang X (2019) Spectrogram-frame linear network and continuous frame sequence for bird sound classification. Ecol Inform 54:101009
Funding
This work supported in part by the National Natural Science Foundation of China (Grant No. 61703441).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, Z., Chen, W., Chen, A. et al. Birdsong classification based on multi feature channel fusion. Multimed Tools Appl 81, 15469–15490 (2022). https://doi.org/10.1007/s11042-022-12570-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12570-3