(2) * Asma Alshargabi
*corresponding author
AbstractSpeaker recognition aims to identify who is speaking from their voice and is widely used in security, personalization, and archival search. A related, culturally significant task is recognizing Qur’ān reciters from their recitations. The Quran is the central religious text of Islam and is recited with codified pronunciation and melodic rules (tajwīd and maqām). Distinguishing reciters can support digital archiving, educational feedback, and retrieval of stylistically similar recitations. We present a controlled comparison of deep learning approaches for Qur’ān reciter recognition, contrasting feature-based pipelines with end-to-end waveform models under a unified protocol. Using ṣūrah Al-Tawbah recitations from 12 reciters (18,540 clips; fixed 2 s segments), an X-Vector architecture with Mel-Frequency Cepstral Coefficients (MFCCs) attains perfect test performance (accuracy/precision/recall/F1 =100%). Convolutional Neural Network (CNN) and Bidirectional LSTM (BLSTM) baselines achieve near-optimal results (99.96% accuracy and F1), while an end-to-end X-Vector trained on raw waveforms reaches 98.77% accuracy (F1 = 0.9877). These findings indicate that explicit spectral features remain advantageous for short segments requiring fine acoustic discrimination, although end-to-end learning is competitive and simplifies preprocessing. We release the curated dataset with standardized splits and training scripts to enable reproducible benchmarking. Overall, feature-informed X-Vectors constitute a strong reference for short-segment reciter identification, and our results motivate hybrid/self-supervised front ends, tajwīd-aware analysis, and real-time, on-device deployment.
KeywordsArtificial intelligence; Natural Language Processing Systems; Quran reciter recognition; Deep learning; End-to-End learning
|
DOIhttps://doi.org/10.26555/ijain.v12i1.2288 |
Article metricsAbstract views : 168 | PDF views : 18 |
Cite |
Full Text Download
|
References
[1] H. Tabbal, W. El Falou, and B. Monla, “Analysis and implementation of a ‘Quran’ verses delimitation system in audio files using speech recognition techniques,” in 2006 2nd International Conference on Information & Communication Technologies, 2006, vol. 2, pp. 2979–2984, doi: 10.1109/ICTTA.2006.1684889.
[2] S. A. E. Mohamed, A. S. Hassanin, and M. T. Ben Othman, “Educational System for the Holy Quran and Its Sciences for Blind and Handicapped People Based on Google Speech API,” J. Softw. Eng. Appl., vol. 07, no. 03, pp. 150–161, 2014, doi: 10.4236/jsea.2014.73017.
[3] S. M. Abdou and M. Rashwan, “A Computer Aided Pronunciation Learning system for teaching the holy quran Recitation rules,” in 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), Nov. 2014, pp. 543–550, doi: 10.1109/AICCSA.2014.7073246.
[4] T. Mahboob, M. Khanum, M. Sikandar, H. Khiyal, and R. Bibi, “Speaker Identification Using Gmm With Mfcc In Python,” J. Crit. Rev., vol. 7, no. 14, pp. 126–135, Jul. 2020, doi: 10.31838/jcr.07.14.103.
[5] J. A. Pandian, R. Thirunavukarasu, and E. Kotei, “A Novel Convolutional Neural Network Model for Automatic Speaker Identification From Speech Signals,” IEEE Access, vol. 12, pp. 51381–51394, 2024, doi: 10.1109/ACCESS.2024.3385858.
[6] M. K. Singh, “A text independent speaker identification system using ANN, RNN, and CNN classification technique,” Multimed. Tools Appl., vol. 83, no. 16, pp. 48105–48117, Nov. 2023, doi: 10.1007/s11042-023-17573-2.
[7] G. Samara, E. Al-Daoud, N. Swerki, and D. Alzu’bi, “The Recognition of Holy Qur’ān Reciters Using the MFCCs’ Technique and Deep Learning,” Adv. Multimed., vol. 2023, pp. 1–14, Mar. 2023, doi: 10.1155/2023/2642558.
[8] A. Qayyum, S. Latif, and J. Qadir, “Quran Reciter Identification: A Deep Learning Approach,” in 2018 7th International Conference on Computer and Communication Engineering (ICCCE), Sep. 2018, pp. 492–497, doi: 10.1109/ICCCE.2018.8539336.
[9] G. Karthiha and S. Allwin, “Transfer learning approaches for EfficientNetV2 B0 and ImageNet skin cancer classification in a convolutional neural network,” PeerJ Comput. Sci., vol. 11, p. e3362, Dec. 2025, doi: 10.7717/peerj-cs.3362.
[10] H.-A. Saber, A. Younes, M. Osman, and I. Elkabani, “Quran reciter identification using NASNetLarge,” Neural Comput. Appl., vol. 36, no. 12, pp. 6559–6573, Apr. 2024, doi: 10.1007/s00521-023-09392-1.
[11] A. Moustafa and S. A. Aly, “Towards an Efficient Voice Identification Using Wav2Vec2.0 and HuBERT Based on the Quran Reciters Dataset,” arXiv, pp. 1–5, 2021, [Online]. Available at: http://arxiv.org/abs/2111.06331.
[12] M. Fazel-Zarandi and W.-N. Hsu, “Cocktail Hubert: Generalized Self-Supervised Pre-Training for Mixture and Single-Source Speech,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun. 2023, vol. 2023-June, pp. 1–5, doi: 10.1109/ICASSP49357.2023.10096630.
[13] H. Liu et al., “AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 32, pp. 2871–2883, 2024, doi: 10.1109/TASLP.2024.3399607.
[14] R. Jain, A. Barcovschi, M. Y. Yiwere, D. Bigioi, P. Corcoran, and H. Cucu, “A WAV2VEC2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition,” IEEE Access, vol. 11, no. April, pp. 46938–46948, 2023, doi: 10.1109/ACCESS.2023.3275106.
[15] S. Hawi, J. Alhozami, R. AlQahtani, D. AlSafran, M. Alqarni, and L. El Sahmarany, “Automatic Parkinson’s disease detection based on the combination of long-term acoustic features and Mel frequency cepstral coefficients (MFCC),” Biomed. Signal Process. Control, vol. 78, p. 104013, Sep. 2022, doi: 10.1016/j.bspc.2022.104013.
[16] O. Marshall, “The oleaginous voice: Auto-Tune, linear predictive coding, and the security-petroleum complex,” Hist. Technol., vol. 40, no. 3, pp. 276–296, Jul. 2024, doi: 10.1080/07341512.2024.2402580.
[17] M. Han, “Artificial intelligence-driven tone recognition of Guzheng: A linear prediction approach,” Demonstr. Math., vol. 57, no. 1, p. 98, Nov. 2024, doi: 10.1515/dema-2024-0009.
[18] S. P. Yadav, S. Zaidi, A. Mishra, and V. Yadav, “Survey on Machine Learning in Speech Emotion Recognition and Vision Systems Using a Recurrent Neural Network (RNN),” Arch. Comput. Methods Eng., vol. 29, no. 3, pp. 1753–1770, May 2022, doi: 10.1007/s11831-021-09647-x.
[19] F. Ye and J. Yang, “A Deep Neural Network Model for Speaker Identification,” Appl. Sci., vol. 11, no. 8, p. 3603, Apr. 2021, doi: 10.3390/app11083603.
[20] M. Tiwari and D. K. Verma, “Enhanced text-independent speaker recognition using MFCC, Bi-LSTM, and CNN-based noise removal techniques,” Int. J. Speech Technol., vol. 27, no. 4, pp. 1013–1026, Dec. 2024, doi: 10.1007/s10772-024-10150-4.
[21] D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, “X-Vectors: Robust DNN Embeddings for Speaker Recognition,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2018, pp. 5329–5333, doi: 10.1109/ICASSP.2018.8461375.
[22] A. Nagrani, J. S. Chung, W. Xie, and A. Zisserman, “Voxceleb: Large-scale speaker verification in the wild,” Comput. Speech Lang., vol. 60, p. 101027, Mar. 2020, doi: 10.1016/j.csl.2019.101027.
[23] J. S. Chung, A. Nagrani, and A. Zisserman, “VoxCeleb2: Deep Speaker Recognition,” in Interspeech 2018, Sep. 2018, pp. 1086–1090, doi: 10.21437/Interspeech.2018-1929.
[24] S. S. Alrumiah and A. A. Al-Shargabi, “Intelligent Quran Recitation Recognition and Verification: Research Trends and Open Issues,” Arab. J. Sci. Eng., vol. 48, no. 8, pp. 9859–9885, Aug. 2023, doi: 10.1007/s13369-022-07273-8.
[25] D. Omran, S. Fawzi, and A. Kandil, “Automatic Detection of Some Tajweed Rules,” in 2023 20th Learning and Technology Conference (L&T), Jan. 2023, pp. 157–160, doi: 10.1109/LT58159.2023.10092350.
[26] M. Tall, T. I. Diop, N. Fatou Ngom, E. Hadj, and A. Thiam, “Deep learning for Quranic reciter recognition and audio content identification,” in 13th Conference on Research in Computer Science and its Applications (CNRIA), 2023, pp. 1–12, [Online]. Available at: https://www.researchgate.net/publication/390271888_Deep_learning_for_Quranic_reciter_recognition_and_audio_content_identification.
[27] M. Mhamed and J. A. Noja, “World Holy Quran Reciter Recognition based on deep learning,” in Proceedings of the 2025 International Conference on Machine Learning and Neural Networks, Apr. 2025, pp. 97–102, doi: 10.1145/3747227.3747242.
[28] G. K. Berdibaeva, O. N. Bodin, V. V. Kozlov, D. I. Nefed’ev, K. A. Ozhikenov, and Y. A. Pizhonkov, “Pre-processing voice signals for voice recognition systems,” in 2017 18th International Conference of Young Specialists on Micro/Nanotechnologies and Electron Devices (EDM), Jun. 2017, pp. 242–245, doi: 10.1109/EDM.2017.7981748.
[29] B. Lohani, C. K. Gautam, P. K. Kushwaha, and A. Gupta, “Deep Learning Approaches for Enhanced Audio Quality Through Noise Reduction,” in 2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE), May 2024, pp. 447–453, doi: 10.1109/IC3SE62002.2024.10593073.
[30] H. W. Al-Dulaimi, A. Aldhahab, and H. M. Al Abbood, “Speaker Identification System Employing Multi-resolution Analysis in Conjunction with CNN,” Int. J. Intell. Eng. Syst., vol. 16, no. 5, pp. 350–363, Oct. 2023, doi: 10.22266/ijies2023.1031.30.
[31] M. Zakariah, M. K. Khan, and H. Malik, “Digital multimedia audio forensics: past, present and future,” Multimed. Tools Appl., vol. 77, no. 1, pp. 1009–1040, Jan. 2018, doi: 10.1007/s11042-016-4277-2.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

























Download