Implementation of Deep Learning in a Voice Recognition System for Virtual Assistants
Abstract
Voice recognition technology has become a vital component in virtual assistants, enabling more natural and efficient user interactions. However, traditional voice recognition systems face challenges in accurately interpreting diverse accents, dialects, and background noise, which can limit their usability. This study investigates the implementation of deep learning techniques to improve the accuracy and adaptability of voice recognition systems within virtual assistant applications. The research aims to enhance voice recognition performance by leveraging deep learning models that can process complex speech patterns and adapt to varied linguistic nuances. A convolutional neural network (CNN) architecture combined with recurrent neural networks (RNN) was used to train the voice recognition model on a large, diverse dataset of audio samples. The dataset included multiple languages, accents, and noisy environments to test the robustness of the model. Results indicate a 25% improvement in word error rate (WER) and a significant increase in recognition accuracy across diverse voice inputs compared to traditional voice recognition systems. The model demonstrated high adaptability, accurately interpreting speech in varying acoustic conditions, thus improving user experience with virtual assistants. These findings suggest that deep learning can significantly enhance voice recognition systems, offering more reliable performance in real-world applications. Implementing deep learning models in voice recognition systems can bridge the gap between human and machine communication, making virtual assistants more accessible and user-friendly.
Full text article
References
Agrawal D.P., Nedjah N., Gupta B.B., & Martinez Perez G. (Ed.). (2022). International Conference on Cyber Security, Privacy and Networking, ICSPN 2021. Lecture Notes in Networks and Systems, 370. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85131201315&partnerID=40&md5=7abe144dce5d67bb25d4dd500dccc064
Annamalai, B., Saravanan, P., & Varadharajan, I. (2023). ABOA-CNN: auction-based optimization algorithm with convolutional neural network for pulmonary disease prediction. Neural Computing and Applications, 35(10), 7463–7474. Scopus. https://doi.org/10.1007/s00521-022-08033-3
Aramaki M., Kronland-Martinet R., Ystad S., Hirata K., & Kitahara T. (Ed.). (2023). 15th International Symposium on Computer Music Multidisciplinary Research, CMMR 2021. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13770 LNCS. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85165124006&partnerID=40&md5=cbe80120bf7e73e0db5fd196da57093b
Atosha, P. B., Özbilge, E., & K?rsal, Y. (2024). Comparative Analysis of Deep Recurrent Neural Networks for Speech Recognition. IEEE Conf. Signal Process. Commun. Appl., SIU - Proc. 32nd IEEE Conference on Signal Processing and Communications Applications, SIU 2024 - Proceedings. Scopus. https://doi.org/10.1109/SIU61531.2024.10600944
Balas V.E., Sinha G.R., Agarwal B., Sharma T.K., Dadheech P., & Mahrishi M. (Ed.). (2022). 5th International Conference on Emerging Technologies in Computer Engineering: Cognitive Computing and Intelligent IoT, ICETCE 2022. Communications in Computer and Information Science, 1591 CCIS. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85131922766&partnerID=40&md5=a900832ef66d72cd007ec9bb82a99b53
Bartusiak, E. R., & Delp, E. J. (2021). Synthesized Speech Detection Using Convolutional Transformer-Based Spectrogram Analysis. Dalam Matthews M.B. (Ed.), Conf. Rec. Asilomar Conf. Signals Syst. Comput. (Vol. 2021-October, hlm. 1426–1430). IEEE Computer Society; Scopus. https://doi.org/10.1109/IEEECONF53345.2021.9723142
Cárdenas-López, H. M., Zatarain-Cabada, R., Barrón-Estrada, M. L., & Mitre-Hernández, H. (2023). Semantic fusion of facial expressions and textual opinions from different datasets for learning-centered emotion recognition. Soft Computing, 27(22), 17357–17367. Scopus. https://doi.org/10.1007/s00500-023-08076-1
Chaudhary, P., & Singh, A. (2024). Real-time detection of signs using a deep learning approach based on convolutional neural networks and recurrent neural networks with a use case in metaverse. Dalam Metaverse Technologies in Healthcare (hlm. 263–281). Elsevier; Scopus. https://doi.org/10.1016/B978-0-443-13565-1.00005-1
Dash, P., Lakshmiprabha, M., Kalaiselvi, N., Valarmathi, E., Bhavani, K., Padmapriya, V., & Vanaja, C. (2024). Gesture-driven communication and empowering the deaf-mute community using deep learning algorithm. Dalam Exp. Youth Studies in the Age of AI (hlm. 290–297). IGI Global; Scopus. https://doi.org/10.4018/979-8-3693-3350-1.ch016
Dhruva, M. S., Sunitha, R., & Chandrika, J. (2024). An Exploration of Emotion Recognition using Deep Learning across Multiple Modalities: Spoken Language, Written Text, and Facial Expressions. Dalam Stephen J., Sharma P., Chaba Y., Abraham K.U., Anooj P.K., Mohammad N., Thomas G., & Srikiran S. (Ed.), Int. Conf. Adv. Comput., Control, Telecommun. Technol., ACT (Vol. 2, hlm. 5786–5792). Grenze Scientific Society; Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209153420&partnerID=40&md5=5098a39074445b480666148111b76353
Flores Cuautle J.d., Benítez-Mata B., Salido-Ruiz R.A., Vélez-Pérez H.A., Alonso-Silverio G.A., Dorantes-Méndez G., Mejía-Rodríguez A.R., Zúñiga-Aguilar E., & Hierro-Gutiérrez E.D. (Ed.). (2024). 46th Mexican Conference on Biomedical Engineering, CNIB 2023. Dalam IFMBE Proc. (Vol. 96). Springer Science and Business Media Deutschland GmbH; Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85177176594&partnerID=40&md5=904640d21862d5465d90edd50e842297
Garg, H., Jhunthra, S., Kindra, M., Dixit, V., & Gupta, V. (2024). A deep learning-based integrated voice assistance system for partially disabled people. Dalam Uncertainty in Computational Intelligence-Based Decision Making: A volume in Advanced Studies in Complex Systems (hlm. 293–310). Elsevier; Scopus. https://doi.org/10.1016/B978-0-443-21475-2.00010-2
Ghazali R., Mohd Nawi N., Deris M.M., Abawajy J.H., & Arbaiy N. (Ed.). (2022). 5th International Conference on Soft Computing and Data Mining, SCDM 2022. Lecture Notes in Networks and Systems, 457 LNNS. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85130384989&partnerID=40&md5=c92b09bbd4bbc973c3a7a7c5f12e1245
Gupta, N., Thakur, V., Patil, V., Vishnoi, T., & Bhangale, K. (2023). Analysis of Affective Computing for Marathi Corpus using Deep Learning. Int. Conf. Emerg. Technol., INCET. 2023 4th International Conference for Emerging Technology, INCET 2023. Scopus. https://doi.org/10.1109/INCET57972.2023.10170346
Harby, F., Alohali, M., Thaljaoui, A., & Talaat, A. S. (2024). Exploring Sequential Feature Selection in Deep Bi-LSTM Models for Speech Emotion Recognition. Computers, Materials and Continua, 78(2), 2689–2719. Scopus. https://doi.org/10.32604/cmc.2024.046623
Harika, R., Uday, T., Sirisha, M. L., Sahitya, M. S. L., Drugaanjali, K., & Srinivas, M. S. (2024). A Review of Advancements in Facial Emotion Recognition and Detection Using Deep Learning. Proc. - Int. Conf. Soc. Sustain. Innov. Technol. Eng., SASI-ITE, 290–295. Scopus. https://doi.org/10.1109/SASI-ITE58663.2024.00062
Izountar, Y., Benbelkacem, S., Otmane, S., Khababa, A., Zenati, N., & Masmoudi, M. (2021). Towards an adaptive Virtual Reality Serious Game System for Motor Rehabilitation based on Facial Emotion Recognition. Proc. Int. Conf. Artif. Intell. Cyber Secur. Syst. Priv., AI-CSP. 2021 Proceedings of the International Conference on Artificial Intelligence for Cyber Security Systems and Privacy, AI-CSP 2021. Scopus. https://doi.org/10.1109/AI-CSP52968.2021.9671149
Jairam, B. G., & Ponnappa, D. (2023). Gesture Based Virtual Assistant For Deaf-Mutes Using Deep Learning Approach. Int. Conf. Adv. Comput. Commun. Syst., ICACCS, 1–7. Scopus. https://doi.org/10.1109/ICACCS57279.2023.10112986
Kalra, H. (2023). LSTM Based Feature Learning and CNN Based Classification for Speech Emotion Recognition. Int. Conf. Data Sci. Netw. Secur., ICDSNS. 2023 International Conference on Data Science and Network Security, ICDSNS 2023. Scopus. https://doi.org/10.1109/ICDSNS58469.2023.10244802
Kamath, S., Rajendran, R., Wan, Q., Panetta, K., & Agaian, S. S. (2019). TERNet: A deep learning approach for thermal face emotion recognition. Dalam Agaian S.S., Asari V.K., & DelMarco S.P. (Ed.), Proc SPIE Int Soc Opt Eng (Vol. 10993). SPIE; Scopus. https://doi.org/10.1117/12.2518708
Keerthana, P. S. M., Vishal, K., Sivani, K. S. S., & Kumar, S. (2022). Covid-19 detection from X-ray scans using Alexa. Dalam Kumar D., Dey T.K., & Dash S. (Ed.), Int. Conf. Recent Trends Comput. Sci. Technol., ICRTCST - Proc. (hlm. 121–124). Institute of Electrical and Electronics Engineers Inc.; Scopus. https://doi.org/10.1109/ICRTCST54752.2022.9781958
Kim J., Khan J., Singh M., Tiwary U.S., Sur M., & Singh D. (Ed.). (2022). 13th International Conference on Intelligent Human Computer Interaction, IHCI 2021. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13184 LNCS. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85127097059&partnerID=40&md5=6951b76fba1842e3e1387894cb0a6b32
Kotwal, R. S., & Gautam, A. (2024). Speech Recognition System based on Wavelet Multi- Resolution Analysis using One-Dimensional CNN-LSTM Network. Int. Conf. I-SMAC (IoT Soc., Mob., Anal. Cloud), I-SMAC - Proc., 882–887. Scopus. https://doi.org/10.1109/I-SMAC61858.2024.10714858
Kreyssig, F. L., & Woodland, P. C. (2020). Cosine-distance virtual adversarial training for semi-supervised speaker-discriminative acoustic embeddings. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH, 2020-October, 3241–3245. Scopus. https://doi.org/10.21437/Interspeech.2020-2270
Ku?, S., & Szmur?o, R. (2021). CNN-based character recognition for a contextless text input system in immersive VR. CPEE - Int. Conf. “Comput. Probl. Electr. Eng.” CPEE 2021 - 22nd International Conference “Computational Problems of Electrical Engineering.” Scopus. https://doi.org/10.1109/CPEE54040.2021.9585252
Kwon, S. (2021). 1D-CNN: Speech Emotion Recognition System Using a Stacked Network with Dilated CNN Features. Computers, Materials and Continua, 67(3), 4039–4059. Scopus. https://doi.org/10.32604/cmc.2021.015070
Li, Y., Hashim, A. S., Lin, Y., Nohuddin, P. N. E., Venkatachalam, K., & Ahmadian, A. (2024). AI-based visual speech recognition towards realistic avatars and lip-reading applications in the metaverse. Applied Soft Computing, 164. Scopus. https://doi.org/10.1016/j.asoc.2024.111906
Mishra, S., Bhatnagar, N., Prakasam, P., & T. R, S. (2024). Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model. Multimedia Tools and Applications, 83(13), 37603–37620. Scopus. https://doi.org/10.1007/s11042-023-16849-x
Moreno, R. J., Estepa, R. C., & Baquero, J. M. (2022). Audio Commands Recognition Through Deep Learning for Control Mobile Residential Assistant Robot. Dalam Larrondo Petrie M.M., Texier J., Pena A., & Viloria J.A.S. (Ed.), Proc. LACCEI int. Multi-conf. Eng. Educ. Technol. (Vol. 2022-July). Latin American and Caribbean Consortium of Engineering Institutions; Scopus. https://doi.org/10.18687/LACCEI2022.1.1.24
Nagar A.K., Jat D.S., Marín-Raventós G., & Mishra D.K. (Ed.). (2022). 5th World Conference on Smart Trends in Systems Security and Sustainability, WS4 2021. Lecture Notes in Networks and Systems, 333. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85123277117&partnerID=40&md5=081d0cf7dbfe9ac9fe77a8d358b44b56
Namratha, M., Lokesh, R., Bhat, P., Srikanth, N., & Gagan, M. (2024). InterviewPal-Elevating Interview Automation with Deep Learning and Natural Language Processing Perspectives. Int. Conf. Emerg. Technol, Comput. Sci. Interdiscip. Appl., ICETCS. International Conference on Emerging Technologies in Computer Science for Interdisciplinary Applications, ICETCS 2024. Scopus. https://doi.org/10.1109/ICETCS61022.2024.10543368
Nayak, S. K., Nayak, A. K., Mishra, S., & Mohanty, P. (2023). Deep Learning Approaches for Speech Command Recognition in a Low Resource KUI Language. International Journal of Intelligent Systems and Applications in Engineering, 11(2), 377–386. Scopus.
Pietka E., Badura P., Kawa J., & Wieclawek W. (Ed.). (2019). 7th International Conference on Information Technology in Biomedicine, ITIB 2019. Advances in Intelligent Systems and Computing, 1011. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85070757059&partnerID=40&md5=9a1378fee4367ce09d2230413cc89c5a
Porwal, A., Tyagi, P. K., & Agarwal, D. K. (2023). Comparative Analysis of Different Neural Network Models for Speaker Gender Recognition by Voice. Int. Conf. Commun., Secur. Artif. Intell., ICCSAI, 535–540. Scopus. https://doi.org/10.1109/ICCSAI59793.2023.10421302
Praveen, T. N. V. S., Sivathmika, D., Jahnavi, G., & Bolledu, J. (2023). An In-depth Exploration of ResNet-50 for Complex Emotion Recognition to Unraveling Emotional States. Dalam Kumar R., Kumar R., Gupta M., Gupta M., Srivastava R., & Srivastava R. (Ed.), Int. Conf. Adv. Comput. Comput. Technol., InCACCT (hlm. 322–326). Institute of Electrical and Electronics Engineers Inc.; Scopus. https://doi.org/10.1109/InCACCT57535.2023.10141774
Qiao, Z., Zhai, L., Zhang, S., & Zhang, X. (2021). Encrypted 5G Over- The- Top Voice Traffic Identification Based on Deep Learning. Proc. IEEE Symp. Comput. Commun., 2021-September. Scopus. https://doi.org/10.1109/ISCC53001.2021.9631458
Rajesh Immanuel, R., & Sangeetha, S. K. B. (2023). Decoding Emotions Using Deep Learning Approach to EEG-Based Emotion Recognition. Intell. Comput. Control Eng. Bus. Syst., ICCEBS. 2023 Intelligent Computing and Control for Engineering and Business Systems, ICCEBS 2023. Scopus. https://doi.org/10.1109/ICCEBS58601.2023.10449107
Renault E., Boumerdassi S., & Mühlethaler P. (Ed.). (2021). 3rd International Conference on Machine Learning for Networking, MLN 2020. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12629 LNCS. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85103569737&partnerID=40&md5=358150c0d5bbc44fb1b54c97d3550d7c
Rouhafzay, G., Cretu, A.-M., & Payeur, P. (2021). Transfer of learning from vision to touch: A hybrid deep convolutional neural network for visuo-tactile 3d object recognition. Sensors (Switzerland), 21(1), 1–15. Scopus. https://doi.org/10.3390/s21010113
Saeed F., Al-Hadhrami T., Mohammed F., & Mohammed E. (Ed.). (2021). 1st International Conference of Advanced Computing and Informatics, ICACIN 2020. Advances in Intelligent Systems and Computing, 1188. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85096572848&partnerID=40&md5=e632d29e33c0b7360bea80bb6acf3c59
Sain, B., Kumar, R., & Jaiswal, A. (2024). Developmental Sequence in the Comprehension Method of Deep Learning for Classifications of Human Emotions. Nanotechnology Perceptions, 20(S6), 691–702. Scopus. https://doi.org/10.62441/nano-ntp.v20iS6.55
Sajjad, M., & Kwon, S. (2020). Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM. IEEE Access, 8, 79861–79875. Scopus. https://doi.org/10.1109/ACCESS.2020.2990405
Sangeethapriya, R., & Akilandeswari, J. (2024). Classification of cyberbullying messages using text, image and audio in social networks: A deep learning approach. Multimedia Tools and Applications, 83(1), 2237–2266. Scopus. https://doi.org/10.1007/s11042-023-15538-z
Zaynidinov H., Singh M., Tiwary U.S., & Singh D. (Ed.). (2023). 14th International Conference on Intelligent Human Computer Interaction, IHCI 2022. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13741 LNCS. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159493288&partnerID=40&md5=04b170e16d37a23ac220b8e2cce39c07
Zeeshan, M., Qayoom, H., & Hassan, F. (2021). Robust Speech Emotion Recognition System Through Novel ER-CNN and Spectral Features. Int. Symp. Adv. Electr. Commun. Technol., ISAECT. 2021 4th International Symposium on Advanced Electrical and Communication Technologies, ISAECT 2021. Scopus. https://doi.org/10.1109/ISAECT53699.2021.9668480
Zhao F. & Miao D. (Ed.). (2024). 1st International Conference on AI-generated Content, AIGC 2023. Communications in Computer and Information Science, 1946 CCIS. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85177170160&partnerID=40&md5=9eafc88148762598fada4ad27e9beff2
Zhao, Y., Guo, M., Chen, X., Sun, J., & Qiu, J. (2024). Attention-Based CNN Fusion Model for Emotion Recognition during Walking Using Discrete Wavelet Transform on EEG and Inertial Signals. Big Data Mining and Analytics, 7(1), 188–204. Scopus. https://doi.org/10.26599/BDMA.2023.9020018
Authors
Copyright (c) 2024 Apriyanto Apriyanto, Rohmat Sahirin, Snyder Bradford

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.