An Effectiveness of Repeating a Spoken Digit for Speaker Verification

Vo, Duy; Le, Si Minh; Do, Hao Duc; Tran, Son Thai

doi:10.1007/978-3-031-42430-4_50

Duy Vo^12,13,
Si Minh Le^12,13,
Hao Duc Do¹⁴ &
…
Son Thai Tran^12,13

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1863))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

395 Accesses
1 Citations

Abstract

In recent years, there has been significant research in speaker verification, and the use of deep neural networks has dramatically enhanced the performance of speaker verification systems. This paper aims to investigate the effect of a predefined passphrase on the system’s performance, using the state-of-the-art ECAPA-TDNN model as a speaker modeling technique. Our study focuses on discovering dominant passphrases from spoken digits through text-dependent speaker verification trial types. By comparing the performance of spoken digits and considering their pronunciations, we can analyze the influence of passphrases in the human voice on the speaker verification system. Furthermore, we identify that repeating incrementally up to a certain number of times leads to improvements in the accuracy of the system. Overall, our study has significant implications for advancing speaker authentication systems, especially systems deployed on embedded devices, which require low computation resources and need to be optimized in many aspects.

D. Vo and S. M. Le—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

End-to-End Speaker Verification for Short Utterances

A deep learning approach for speaker recognition

Article 18 December 2019

Deep Learning Framework for Speaker Verification Under Multi Sensor, Multi Lingual and Multi Session Conditions

References

Becker, S., Ackermann, M., Lapuschkin, S., Müller, K., Samek, W.: Interpreting and explaining deep neural networks for classification of audio signals. CoRR abs/1807.03418 (2018). http://arxiv.org/abs/1807.03418
Rezaur rahman Chowdhury, F.A., Wang, Q., Moreno, I.L., Wan, L.: Attention-based models for text-dependent speaker verification. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5359–5363 (2018). https://doi.org/10.1109/ICASSP.2018.8461587
Chung, J.S., et al.: VoxSRC 2019: the first VoxCeleb speaker recognition challenge (2019). https://doi.org/10.48550/ARXIV.1912.02522
Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. In: Interspeech 2018. ISCA (2018). https://doi.org/10.21437/interspeech.2018-1929
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4685–4694 (2019). https://doi.org/10.1109/CVPR.2019.00482
Desplanques, B., Thienpondt, J., Demuynck, K.: ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification. In: Interspeech 2020. ISCA (2020). https://doi.org/10.21437/interspeech.2020-2650
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). https://doi.org/10.48550/ARXIV.1512.03385
Heigold, G., Moreno, I., Bengio, S., Shazeer, N.: End-to-end text-dependent speaker verification (2015). https://doi.org/10.48550/ARXIV.1509.08062
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). https://doi.org/10.48550/ARXIV.1412.6980
Ko, T., Peddinti, V., Povey, D., Seltzer, M.L., Khudanpur, S.: A study on data augmentation of reverberant speech for robust speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5220–5224 (2017). https://doi.org/10.1109/ICASSP.2017.7953152
Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: Interspeech 2017. ISCA (2017). https://doi.org/10.21437/interspeech.2017-950
Ozaydin, S.: An isolated word speaker recognition system. In: 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 1–5 (2017). https://doi.org/10.1109/ICECTA.2017.8251987
Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. In: Interspeech 2019. ISCA (2019). https://doi.org/10.21437/interspeech.2019-2680
Snyder, D., Chen, G., Povey, D.: MUSAN: a music, speech, and noise corpus (2015). https://doi.org/10.48550/ARXIV.1510.08484
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333 (2018). https://doi.org/10.1109/ICASSP.2018.8461375

Download references

Acknowledgments

This research is supported by research funding from Faculty of Information Technology, University of Science, Vietnam National University - Ho Chi Minh City.

Author information

Authors and Affiliations

Faculty of Information Technology, University of Science, Ho Chi Minh City, Vietnam
Duy Vo, Si Minh Le & Son Thai Tran
Vietnam National University, Ho Chi Minh City, Vietnam
Duy Vo, Si Minh Le & Son Thai Tran
FPT University, Ho Chi Minh City, Vietnam
Hao Duc Do

Authors

Duy Vo
View author publications
You can also search for this author in PubMed Google Scholar
Si Minh Le
View author publications
You can also search for this author in PubMed Google Scholar
Hao Duc Do
View author publications
You can also search for this author in PubMed Google Scholar
Son Thai Tran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Duy Vo .

Editor information

Editors and Affiliations

Wrocław University of Technology, Wrocław, Poland
Ngoc Thanh Nguyen
King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Siridech Boonsang
Iwate Prefectural University, Iwate, Japan
Hamido Fujita
Wrocław University of Science and Technology, Wrocław, Poland
Bogumiła Hnatkowska
National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
King Mongkut's Institute of Technology, Ladkrabang, Thailand
Kitsuchart Pasupa
Malaysia Japan International Institute of Technology, Kuala Lumpur, Malaysia
Ali Selamat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vo, D., Le, S.M., Do, H.D., Tran, S.T. (2023). An Effectiveness of Repeating a Spoken Digit for Speaker Verification. In: Nguyen, N.T., et al. Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2023. Communications in Computer and Information Science, vol 1863. Springer, Cham. https://doi.org/10.1007/978-3-031-42430-4_50

Download citation

DOI: https://doi.org/10.1007/978-3-031-42430-4_50
Published: 29 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42429-8
Online ISBN: 978-3-031-42430-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Effectiveness of Repeating a Spoken Digit for Speaker Verification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

End-to-End Speaker Verification for Short Utterances

A deep learning approach for speaker recognition

Deep Learning Framework for Speaker Verification Under Multi Sensor, Multi Lingual and Multi Session Conditions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Effectiveness of Repeating a Spoken Digit for Speaker Verification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

End-to-End Speaker Verification for Short Utterances

A deep learning approach for speaker recognition

Deep Learning Framework for Speaker Verification Under Multi Sensor, Multi Lingual and Multi Session Conditions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation