Skip to main content
Log in

Birdsong classification based on multi feature channel fusion

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Aiming at the essential feature of the time-continuity of birdsong in nature, this paper proposed a birdsong classification model composed of two feature channels, which combines the features of time domain and time frequency domain. In order to make better use of the features, we used the improved average threshold method to denoise the original time-domain waveform features to reduce the influence of noise features. The most suitable feature extractor and the best fusion method of these two features are discussed. In this paper, the 3D convolutional neural network (3DCNN) and 2D convolutional neural network (2DCNN) were respectively applied as feature extractors of log_mel spectrum and waveform images. Then the advanced feature, which was extracted from these two feature channels, was fused in the middle stage, and the output enhanced feature was used as the input of double gated recurrent unit (d-GRU) network. In the work, birdsongs of four species from Xeno-Canto were selected for testing. The results showed that these three methods had improved the classification effect: feature fusion method in time domain and time-frequency domain, weighted average threshold noise reduction method and the method of extracting birdsong features via different types of feature extractors. The method of this paper had achieved mean average precision (MAP) of 95.9% in the classification comparison experiments, which was an inspiring outcome.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Abdoli S, Cardinal P, Koerich AL (2019) End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst Appl 136:252–263

    Article  Google Scholar 

  2. Adavanne S, Drossos K, Çakir E, Virtanen T (2017) Stacked convolutional and recurrent neural networks for bird audio detection. In: 2017 25th European signal processing conference (EUSIPCO). IEEE, pp 1729–1733

    Chapter  Google Scholar 

  3. Bae SH, Choi I, Kim NS (2016) Acoustic scene classification using parallel combination of LSTM and CNN. In: Proceedings of the detection and classification of acoustic scenes and events 2016 workshop (DCASE2016), pp 11–15

    Google Scholar 

  4. Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271

  5. Bhatt G, Gupta A, Arora A, Raman B (2018) Acoustic features fusion using attentive multi-channel deep architecture. arXiv preprint arXiv:1811.00936

  6. Bold N, Zhang C, Akashi T (2019) Cross-domain deep feature combination for bird species classification with audio-visual data. IEICE Trans Inf Syst 102(10):2033–2042

    Article  Google Scholar 

  7. Briggs F, Lakshminarayanan B, Neal L, Fern XZ, Raich R, Hadley SJ, … Betts MG (2012) Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. The Journal of the Acoustical Society of America 131(6):4640–4650

    Article  Google Scholar 

  8. Chen X, Zhou G, Chen A, Yi J, Zhang W, Hu Y (2020) Identification of tomato leaf diseases based on combination of ABCK-BWTR and B-ARNet. Comput Electron Agric 178:105730

    Article  Google Scholar 

  9. Chou CH, Lee CH, Ni HW (2007) Bird species recognition by comparing the HMMs of the syllables. In: Second international conference on innovative computing, Informatio and control (ICICIC 2007). IEEE, pp 143–143

    Chapter  Google Scholar 

  10. Dennis JW (2014) Sound event recognition in unstructured environments using spectrogram image processing (Doctoral dissertation).

  11. Fagerlund S (2007) Bird species recognition using support vector machines. EURASIP Journal on Advances in Signal Processing 2007(1):038637

    Article  Google Scholar 

  12. Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 6645–6649

    Chapter  Google Scholar 

  13. Grill T, Schlüter J (2017) Two convolutional neural networks for bird detection in audio signals. In: 2017 25th European signal processing conference (EUSIPCO). IEEE, pp 1764–1768

    Chapter  Google Scholar 

  14. Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Silver D (2017). Rainbow: Combining improvements in deep reinforcement learning. arXiv preprint arXiv:1710.02298

  15. Himawan I, Towsey M, Roe P (2018) 3D convolution recurrent neural networks for bird sound detection

  16. Kahl S, Wilhelm-Stein T, Hussein H, Klinck H, Kowerko D, Ritter M, Eibl M (2017) Large-Scale Bird Sound Classification using Convolutional Neural Networks. In: CLEF (Working Notes)

    Google Scholar 

  17. Kim J, Lee Y, Kim D, Ko H (2020) Temporal attention based animal sound classification. The Journal of the Acoustical Society of Korea 39(5):406–413

    Google Scholar 

  18. Koh CY, Chang JY, Tai CL, Huang DY, Hsieh HH, Liu YW (2019) Bird sound classification using convolutional neural networks. In: CLEF (Working Notes)

    Google Scholar 

  19. Lasseck M (2013) Bird song classification in field recordings: winning solution for NIPS4B 2013 competition. In: Proc. of int. symp. Neural information scaled for bioacoustics, sabiod.org/nips4b, joint to NIPS, Nevada, pp 176–181

    Google Scholar 

  20. Lee CH, Chou CH, Han CC, Huang RZ (2006) Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis. Pattern Recognition Letters 27(2):93–101

    Article  Google Scholar 

  21. Leng YR, Tran HD (2014) Multi-label bird classification using an ensemble classifier with simple features. In: Signal and information processing association annual summit and conference (APSIPA), 2014 Asia-Pacific. IEEE, pp 1–5

    Google Scholar 

  22. McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, Nieto O (2015) Librosa: audio and music signal analysis in python. In: Proceedings of the 14th python in science conference, vol 8, pp 18–25

    Google Scholar 

  23. McLoughlin I, Xie Z, Song Y, Phan H, Palaniappan R (2020) Time–frequency feature fusion for noise robust audio event classification. Circ Syst Signal Process 39(3):1672–1687

    Article  Google Scholar 

  24. Müller L, Marti M (2018) Bird sound classification using a bidirectional LSTM. In: CLEF (Working Notes)

    Google Scholar 

  25. Nanni L, Maguolo G, Brahnam S, Paci M (2020) An ensemble of convolutional neural networks for audio classification. arXiv preprint arXiv:2007.07966

  26. Piczak KJ (2016) Recognizing bird species in audio recordings using deep convolutional neural networks. In: CLEF (working notes), pp 534–543

    Google Scholar 

  27. Qiao Y, Qian K, Zhao Z (2020) Learning higher representations from bioacoustics: a sequence-to-sequence deep learning approach for bird sound classification. In: International conference on neural information processing. Springer, Cham, pp 130–138

    Chapter  Google Scholar 

  28. Sainath TN, Vinyals O, Senior A, Sak H (2015) Convolutional, long short-term memory, fully connected deep neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4580–4584

    Chapter  Google Scholar 

  29. Selin A, Turunen J, Tanttu JT (2006) Wavelets in recognition of bird sounds. EURASIP Journal on Advances in Signal Processing 2007:1–9

    Article  Google Scholar 

  30. Shore J, Johnson R (1980) Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans Inf Theory 26(1):26–37

    Article  MathSciNet  Google Scholar 

  31. Stevens SS, Volkmann J, Newman EB (1937) A scale for the measurement of the psychological magnitude pitch. The Journal of the Acoustical Society of America 8(3):185–190

    Article  Google Scholar 

  32. Takahashi N, Gygli M, Van Gool L (2017) Aenet: learning deep audio features for video analysis. IEEE Transactions on Multimedia 20(3):513–524

    Article  Google Scholar 

  33. Tuncer T, Akbal E, Dogan S (2021) Multileveled ternary pattern and iterative ReliefF based bird sound classification. Appl Acoust 176:107866

    Article  Google Scholar 

  34. Xie J, Zhu M (2019) Handcrafted features and late fusion with deep learning for bird sound classification. Ecol Inform 52:74–81

    Article  Google Scholar 

  35. Xie, J. J., Ding, C. Q., Li, W. B., & Cai, C. H. (2018). Audio-only bird species automated identification method with limited training data based on multi-channel deep convolutional neural networks. arXiv preprint arXiv:1803.01107.

  36. Zhang H, McLoughlin I, Song Y (2015) Robust sound event recognition using convolutional neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 559–563

    Chapter  Google Scholar 

  37. Zhang X, Chen A, Zhou G, Zhang Z, Huang X, Qiang X (2019) Spectrogram-frame linear network and continuous frame sequence for bird sound classification. Ecol Inform 54:101009

    Article  Google Scholar 

Download references

Funding

This work supported in part by the National Natural Science Foundation of China (Grant No. 61703441).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aibin Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Z., Chen, W., Chen, A. et al. Birdsong classification based on multi feature channel fusion. Multimed Tools Appl 81, 15469–15490 (2022). https://doi.org/10.1007/s11042-022-12570-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12570-3

Keywords

Navigation