Abstract
In recent years, there has been significant research in speaker verification, and the use of deep neural networks has dramatically enhanced the performance of speaker verification systems. This paper aims to investigate the effect of a predefined passphrase on the system’s performance, using the state-of-the-art ECAPA-TDNN model as a speaker modeling technique. Our study focuses on discovering dominant passphrases from spoken digits through text-dependent speaker verification trial types. By comparing the performance of spoken digits and considering their pronunciations, we can analyze the influence of passphrases in the human voice on the speaker verification system. Furthermore, we identify that repeating incrementally up to a certain number of times leads to improvements in the accuracy of the system. Overall, our study has significant implications for advancing speaker authentication systems, especially systems deployed on embedded devices, which require low computation resources and need to be optimized in many aspects.
D. Vo and S. M. Le—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Becker, S., Ackermann, M., Lapuschkin, S., Müller, K., Samek, W.: Interpreting and explaining deep neural networks for classification of audio signals. CoRR abs/1807.03418 (2018). http://arxiv.org/abs/1807.03418
Rezaur rahman Chowdhury, F.A., Wang, Q., Moreno, I.L., Wan, L.: Attention-based models for text-dependent speaker verification. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5359–5363 (2018). https://doi.org/10.1109/ICASSP.2018.8461587
Chung, J.S., et al.: VoxSRC 2019: the first VoxCeleb speaker recognition challenge (2019). https://doi.org/10.48550/ARXIV.1912.02522
Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. In: Interspeech 2018. ISCA (2018). https://doi.org/10.21437/interspeech.2018-1929
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4685–4694 (2019). https://doi.org/10.1109/CVPR.2019.00482
Desplanques, B., Thienpondt, J., Demuynck, K.: ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification. In: Interspeech 2020. ISCA (2020). https://doi.org/10.21437/interspeech.2020-2650
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). https://doi.org/10.48550/ARXIV.1512.03385
Heigold, G., Moreno, I., Bengio, S., Shazeer, N.: End-to-end text-dependent speaker verification (2015). https://doi.org/10.48550/ARXIV.1509.08062
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). https://doi.org/10.48550/ARXIV.1412.6980
Ko, T., Peddinti, V., Povey, D., Seltzer, M.L., Khudanpur, S.: A study on data augmentation of reverberant speech for robust speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5220–5224 (2017). https://doi.org/10.1109/ICASSP.2017.7953152
Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: Interspeech 2017. ISCA (2017). https://doi.org/10.21437/interspeech.2017-950
Ozaydin, S.: An isolated word speaker recognition system. In: 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 1–5 (2017). https://doi.org/10.1109/ICECTA.2017.8251987
Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. In: Interspeech 2019. ISCA (2019). https://doi.org/10.21437/interspeech.2019-2680
Snyder, D., Chen, G., Povey, D.: MUSAN: a music, speech, and noise corpus (2015). https://doi.org/10.48550/ARXIV.1510.08484
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333 (2018). https://doi.org/10.1109/ICASSP.2018.8461375
Acknowledgments
This research is supported by research funding from Faculty of Information Technology, University of Science, Vietnam National University - Ho Chi Minh City.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Vo, D., Le, S.M., Do, H.D., Tran, S.T. (2023). An Effectiveness of Repeating a Spoken Digit for Speaker Verification. In: Nguyen, N.T., et al. Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2023. Communications in Computer and Information Science, vol 1863. Springer, Cham. https://doi.org/10.1007/978-3-031-42430-4_50
Download citation
DOI: https://doi.org/10.1007/978-3-031-42430-4_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42429-8
Online ISBN: 978-3-031-42430-4
eBook Packages: Computer ScienceComputer Science (R0)