Implementation of Deep Learning in a Voice Recognition System for Virtual Assistants

Apriyanto Apriyanto; Rohmat  Sahirin; Snyder Bradford

doi:10.70177/jsca.v2i6.1533

Apriyanto Apriyanto ⁽¹⁾, Rohmat Sahirin ⁽²⁾, Snyder Bradford ⁽³⁾

(1) Politeknik Tunas Pemuda, Indonesia,

(2) Universitas Pendidikan Indonesia, Indonesia,

(3) International University of Monaco, Monaco

https://doi.org/10.70177/jsca.v2i6.1533

Issue
Vol. 2 No. 6 (2024)

Submitted
15 November 2024

Published
30 December 2024

Keywords:

Convolutional Neural Network, Deep Learning, Speech Recognition Accuracy, Virtual Assistants, Voice Recognition

PDF

Abstract

Voice recognition technology has become a vital component in virtual assistants, enabling more natural and efficient user interactions. However, traditional voice recognition systems face challenges in accurately interpreting diverse accents, dialects, and background noise, which can limit their usability. This study investigates the implementation of deep learning techniques to improve the accuracy and adaptability of voice recognition systems within virtual assistant applications. The research aims to enhance voice recognition performance by leveraging deep learning models that can process complex speech patterns and adapt to varied linguistic nuances. A convolutional neural network (CNN) architecture combined with recurrent neural networks (RNN) was used to train the voice recognition model on a large, diverse dataset of audio samples. The dataset included multiple languages, accents, and noisy environments to test the robustness of the model. Results indicate a 25% improvement in word error rate (WER) and a significant increase in recognition accuracy across diverse voice inputs compared to traditional voice recognition systems. The model demonstrated high adaptability, accurately interpreting speech in varying acoustic conditions, thus improving user experience with virtual assistants. These findings suggest that deep learning can significantly enhance voice recognition systems, offering more reliable performance in real-world applications. Implementing deep learning models in voice recognition systems can bridge the gap between human and machine communication, making virtual assistants more accessible and user-friendly.

Full text article

Generated from XML file

References

Agrawal D.P., Nedjah N., Gupta B.B., & Martinez Perez G. (Ed.). (2022). International Conference on Cyber Security, Privacy and Networking, ICSPN 2021. Lecture Notes in Networks and Systems, 370. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85131201315&partnerID=40&md5=7abe144dce5d67bb25d4dd500dccc064

Annamalai, B., Saravanan, P., & Varadharajan, I. (2023). ABOA-CNN: auction-based optimization algorithm with convolutional neural network for pulmonary disease prediction. Neural Computing and Applications, 35(10), 7463–7474. Scopus. https://doi.org/10.1007/s00521-022-08033-3

Aramaki M., Kronland-Martinet R., Ystad S., Hirata K., & Kitahara T. (Ed.). (2023). 15th International Symposium on Computer Music Multidisciplinary Research, CMMR 2021. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13770 LNCS. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85165124006&partnerID=40&md5=cbe80120bf7e73e0db5fd196da57093b

Atosha, P. B., Özbilge, E., & K?rsal, Y. (2024). Comparative Analysis of Deep Recurrent Neural Networks for Speech Recognition. IEEE Conf. Signal Process. Commun. Appl., SIU - Proc. 32nd IEEE Conference on Signal Processing and Communications Applications, SIU 2024 - Proceedings. Scopus. https://doi.org/10.1109/SIU61531.2024.10600944

Balas V.E., Sinha G.R., Agarwal B., Sharma T.K., Dadheech P., & Mahrishi M. (Ed.). (2022). 5th International Conference on Emerging Technologies in Computer Engineering: Cognitive Computing and Intelligent IoT, ICETCE 2022. Communications in Computer and Information Science, 1591 CCIS. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85131922766&partnerID=40&md5=a900832ef66d72cd007ec9bb82a99b53

Bartusiak, E. R., & Delp, E. J. (2021). Synthesized Speech Detection Using Convolutional Transformer-Based Spectrogram Analysis. Dalam Matthews M.B. (Ed.), Conf. Rec. Asilomar Conf. Signals Syst. Comput. (Vol. 2021-October, hlm. 1426–1430). IEEE Computer Society; Scopus. https://doi.org/10.1109/IEEECONF53345.2021.9723142

Cárdenas-López, H. M., Zatarain-Cabada, R., Barrón-Estrada, M. L., & Mitre-Hernández, H. (2023). Semantic fusion of facial expressions and textual opinions from different datasets for learning-centered emotion recognition. Soft Computing, 27(22), 17357–17367. Scopus. https://doi.org/10.1007/s00500-023-08076-1

Chaudhary, P., & Singh, A. (2024). Real-time detection of signs using a deep learning approach based on convolutional neural networks and recurrent neural networks with a use case in metaverse. Dalam Metaverse Technologies in Healthcare (hlm. 263–281). Elsevier; Scopus. https://doi.org/10.1016/B978-0-443-13565-1.00005-1

Dash, P., Lakshmiprabha, M., Kalaiselvi, N., Valarmathi, E., Bhavani, K., Padmapriya, V., & Vanaja, C. (2024). Gesture-driven communication and empowering the deaf-mute community using deep learning algorithm. Dalam Exp. Youth Studies in the Age of AI (hlm. 290–297). IGI Global; Scopus. https://doi.org/10.4018/979-8-3693-3350-1.ch016

Dhruva, M. S., Sunitha, R., & Chandrika, J. (2024). An Exploration of Emotion Recognition using Deep Learning across Multiple Modalities: Spoken Language, Written Text, and Facial Expressions. Dalam Stephen J., Sharma P., Chaba Y., Abraham K.U., Anooj P.K., Mohammad N., Thomas G., & Srikiran S. (Ed.), Int. Conf. Adv. Comput., Control, Telecommun. Technol., ACT (Vol. 2, hlm. 5786–5792). Grenze Scientific Society; Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209153420&partnerID=40&md5=5098a39074445b480666148111b76353

Flores Cuautle J.d., Benítez-Mata B., Salido-Ruiz R.A., Vélez-Pérez H.A., Alonso-Silverio G.A., Dorantes-Méndez G., Mejía-Rodríguez A.R., Zúñiga-Aguilar E., & Hierro-Gutiérrez E.D. (Ed.). (2024). 46th Mexican Conference on Biomedical Engineering, CNIB 2023. Dalam IFMBE Proc. (Vol. 96). Springer Science and Business Media Deutschland GmbH; Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85177176594&partnerID=40&md5=904640d21862d5465d90edd50e842297

Garg, H., Jhunthra, S., Kindra, M., Dixit, V., & Gupta, V. (2024). A deep learning-based integrated voice assistance system for partially disabled people. Dalam Uncertainty in Computational Intelligence-Based Decision Making: A volume in Advanced Studies in Complex Systems (hlm. 293–310). Elsevier; Scopus. https://doi.org/10.1016/B978-0-443-21475-2.00010-2

Ghazali R., Mohd Nawi N., Deris M.M., Abawajy J.H., & Arbaiy N. (Ed.). (2022). 5th International Conference on Soft Computing and Data Mining, SCDM 2022. Lecture Notes in Networks and Systems, 457 LNNS. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85130384989&partnerID=40&md5=c92b09bbd4bbc973c3a7a7c5f12e1245

Gupta, N., Thakur, V., Patil, V., Vishnoi, T., & Bhangale, K. (2023). Analysis of Affective Computing for Marathi Corpus using Deep Learning. Int. Conf. Emerg. Technol., INCET. 2023 4th International Conference for Emerging Technology, INCET 2023. Scopus. https://doi.org/10.1109/INCET57972.2023.10170346

Harby, F., Alohali, M., Thaljaoui, A., & Talaat, A. S. (2024). Exploring Sequential Feature Selection in Deep Bi-LSTM Models for Speech Emotion Recognition. Computers, Materials and Continua, 78(2), 2689–2719. Scopus. https://doi.org/10.32604/cmc.2024.046623

Harika, R., Uday, T., Sirisha, M. L., Sahitya, M. S. L., Drugaanjali, K., & Srinivas, M. S. (2024). A Review of Advancements in Facial Emotion Recognition and Detection Using Deep Learning. Proc. - Int. Conf. Soc. Sustain. Innov. Technol. Eng., SASI-ITE, 290–295. Scopus. https://doi.org/10.1109/SASI-ITE58663.2024.00062

Izountar, Y., Benbelkacem, S., Otmane, S., Khababa, A., Zenati, N., & Masmoudi, M. (2021). Towards an adaptive Virtual Reality Serious Game System for Motor Rehabilitation based on Facial Emotion Recognition. Proc. Int. Conf. Artif. Intell. Cyber Secur. Syst. Priv., AI-CSP. 2021 Proceedings of the International Conference on Artificial Intelligence for Cyber Security Systems and Privacy, AI-CSP 2021. Scopus. https://doi.org/10.1109/AI-CSP52968.2021.9671149

Jairam, B. G., & Ponnappa, D. (2023). Gesture Based Virtual Assistant For Deaf-Mutes Using Deep Learning Approach. Int. Conf. Adv. Comput. Commun. Syst., ICACCS, 1–7. Scopus. https://doi.org/10.1109/ICACCS57279.2023.10112986

Kalra, H. (2023). LSTM Based Feature Learning and CNN Based Classification for Speech Emotion Recognition. Int. Conf. Data Sci. Netw. Secur., ICDSNS. 2023 International Conference on Data Science and Network Security, ICDSNS 2023. Scopus. https://doi.org/10.1109/ICDSNS58469.2023.10244802

Kamath, S., Rajendran, R., Wan, Q., Panetta, K., & Agaian, S. S. (2019). TERNet: A deep learning approach for thermal face emotion recognition. Dalam Agaian S.S., Asari V.K., & DelMarco S.P. (Ed.), Proc SPIE Int Soc Opt Eng (Vol. 10993). SPIE; Scopus. https://doi.org/10.1117/12.2518708

Keerthana, P. S. M., Vishal, K., Sivani, K. S. S., & Kumar, S. (2022). Covid-19 detection from X-ray scans using Alexa. Dalam Kumar D., Dey T.K., & Dash S. (Ed.), Int. Conf. Recent Trends Comput. Sci. Technol., ICRTCST - Proc. (hlm. 121–124). Institute of Electrical and Electronics Engineers Inc.; Scopus. https://doi.org/10.1109/ICRTCST54752.2022.9781958

Kim J., Khan J., Singh M., Tiwary U.S., Sur M., & Singh D. (Ed.). (2022). 13th International Conference on Intelligent Human Computer Interaction, IHCI 2021. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13184 LNCS. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85127097059&partnerID=40&md5=6951b76fba1842e3e1387894cb0a6b32

Kotwal, R. S., & Gautam, A. (2024). Speech Recognition System based on Wavelet Multi- Resolution Analysis using One-Dimensional CNN-LSTM Network. Int. Conf. I-SMAC (IoT Soc., Mob., Anal. Cloud), I-SMAC - Proc., 882–887. Scopus. https://doi.org/10.1109/I-SMAC61858.2024.10714858

Kreyssig, F. L., & Woodland, P. C. (2020). Cosine-distance virtual adversarial training for semi-supervised speaker-discriminative acoustic embeddings. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH, 2020-October, 3241–3245. Scopus. https://doi.org/10.21437/Interspeech.2020-2270

Ku?, S., & Szmur?o, R. (2021). CNN-based character recognition for a contextless text input system in immersive VR. CPEE - Int. Conf. “Comput. Probl. Electr. Eng.” CPEE 2021 - 22nd International Conference “Computational Problems of Electrical Engineering.” Scopus. https://doi.org/10.1109/CPEE54040.2021.9585252

Kwon, S. (2021). 1D-CNN: Speech Emotion Recognition System Using a Stacked Network with Dilated CNN Features. Computers, Materials and Continua, 67(3), 4039–4059. Scopus. https://doi.org/10.32604/cmc.2021.015070

Li, Y., Hashim, A. S., Lin, Y., Nohuddin, P. N. E., Venkatachalam, K., & Ahmadian, A. (2024). AI-based visual speech recognition towards realistic avatars and lip-reading applications in the metaverse. Applied Soft Computing, 164. Scopus. https://doi.org/10.1016/j.asoc.2024.111906

Mishra, S., Bhatnagar, N., Prakasam, P., & T. R, S. (2024). Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model. Multimedia Tools and Applications, 83(13), 37603–37620. Scopus. https://doi.org/10.1007/s11042-023-16849-x

Moreno, R. J., Estepa, R. C., & Baquero, J. M. (2022). Audio Commands Recognition Through Deep Learning for Control Mobile Residential Assistant Robot. Dalam Larrondo Petrie M.M., Texier J., Pena A., & Viloria J.A.S. (Ed.), Proc. LACCEI int. Multi-conf. Eng. Educ. Technol. (Vol. 2022-July). Latin American and Caribbean Consortium of Engineering Institutions; Scopus. https://doi.org/10.18687/LACCEI2022.1.1.24

Nagar A.K., Jat D.S., Marín-Raventós G., & Mishra D.K. (Ed.). (2022). 5th World Conference on Smart Trends in Systems Security and Sustainability, WS4 2021. Lecture Notes in Networks and Systems, 333. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85123277117&partnerID=40&md5=081d0cf7dbfe9ac9fe77a8d358b44b56

Namratha, M., Lokesh, R., Bhat, P., Srikanth, N., & Gagan, M. (2024). InterviewPal-Elevating Interview Automation with Deep Learning and Natural Language Processing Perspectives. Int. Conf. Emerg. Technol, Comput. Sci. Interdiscip. Appl., ICETCS. International Conference on Emerging Technologies in Computer Science for Interdisciplinary Applications, ICETCS 2024. Scopus. https://doi.org/10.1109/ICETCS61022.2024.10543368

Nayak, S. K., Nayak, A. K., Mishra, S., & Mohanty, P. (2023). Deep Learning Approaches for Speech Command Recognition in a Low Resource KUI Language. International Journal of Intelligent Systems and Applications in Engineering, 11(2), 377–386. Scopus.

Pietka E., Badura P., Kawa J., & Wieclawek W. (Ed.). (2019). 7th International Conference on Information Technology in Biomedicine, ITIB 2019. Advances in Intelligent Systems and Computing, 1011. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85070757059&partnerID=40&md5=9a1378fee4367ce09d2230413cc89c5a

Porwal, A., Tyagi, P. K., & Agarwal, D. K. (2023). Comparative Analysis of Different Neural Network Models for Speaker Gender Recognition by Voice. Int. Conf. Commun., Secur. Artif. Intell., ICCSAI, 535–540. Scopus. https://doi.org/10.1109/ICCSAI59793.2023.10421302

Praveen, T. N. V. S., Sivathmika, D., Jahnavi, G., & Bolledu, J. (2023). An In-depth Exploration of ResNet-50 for Complex Emotion Recognition to Unraveling Emotional States. Dalam Kumar R., Kumar R., Gupta M., Gupta M., Srivastava R., & Srivastava R. (Ed.), Int. Conf. Adv. Comput. Comput. Technol., InCACCT (hlm. 322–326). Institute of Electrical and Electronics Engineers Inc.; Scopus. https://doi.org/10.1109/InCACCT57535.2023.10141774

Qiao, Z., Zhai, L., Zhang, S., & Zhang, X. (2021). Encrypted 5G Over- The- Top Voice Traffic Identification Based on Deep Learning. Proc. IEEE Symp. Comput. Commun., 2021-September. Scopus. https://doi.org/10.1109/ISCC53001.2021.9631458

Rajesh Immanuel, R., & Sangeetha, S. K. B. (2023). Decoding Emotions Using Deep Learning Approach to EEG-Based Emotion Recognition. Intell. Comput. Control Eng. Bus. Syst., ICCEBS. 2023 Intelligent Computing and Control for Engineering and Business Systems, ICCEBS 2023. Scopus. https://doi.org/10.1109/ICCEBS58601.2023.10449107

Renault E., Boumerdassi S., & Mühlethaler P. (Ed.). (2021). 3rd International Conference on Machine Learning for Networking, MLN 2020. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12629 LNCS. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85103569737&partnerID=40&md5=358150c0d5bbc44fb1b54c97d3550d7c

Rouhafzay, G., Cretu, A.-M., & Payeur, P. (2021). Transfer of learning from vision to touch: A hybrid deep convolutional neural network for visuo-tactile 3d object recognition. Sensors (Switzerland), 21(1), 1–15. Scopus. https://doi.org/10.3390/s21010113

Saeed F., Al-Hadhrami T., Mohammed F., & Mohammed E. (Ed.). (2021). 1st International Conference of Advanced Computing and Informatics, ICACIN 2020. Advances in Intelligent Systems and Computing, 1188. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85096572848&partnerID=40&md5=e632d29e33c0b7360bea80bb6acf3c59

Sain, B., Kumar, R., & Jaiswal, A. (2024). Developmental Sequence in the Comprehension Method of Deep Learning for Classifications of Human Emotions. Nanotechnology Perceptions, 20(S6), 691–702. Scopus. https://doi.org/10.62441/nano-ntp.v20iS6.55

Sajjad, M., & Kwon, S. (2020). Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM. IEEE Access, 8, 79861–79875. Scopus. https://doi.org/10.1109/ACCESS.2020.2990405

Sangeethapriya, R., & Akilandeswari, J. (2024). Classification of cyberbullying messages using text, image and audio in social networks: A deep learning approach. Multimedia Tools and Applications, 83(1), 2237–2266. Scopus. https://doi.org/10.1007/s11042-023-15538-z

Zaynidinov H., Singh M., Tiwary U.S., & Singh D. (Ed.). (2023). 14th International Conference on Intelligent Human Computer Interaction, IHCI 2022. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13741 LNCS. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159493288&partnerID=40&md5=04b170e16d37a23ac220b8e2cce39c07

Zeeshan, M., Qayoom, H., & Hassan, F. (2021). Robust Speech Emotion Recognition System Through Novel ER-CNN and Spectral Features. Int. Symp. Adv. Electr. Commun. Technol., ISAECT. 2021 4th International Symposium on Advanced Electrical and Communication Technologies, ISAECT 2021. Scopus. https://doi.org/10.1109/ISAECT53699.2021.9668480

Zhao F. & Miao D. (Ed.). (2024). 1st International Conference on AI-generated Content, AIGC 2023. Communications in Computer and Information Science, 1946 CCIS. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85177170160&partnerID=40&md5=9eafc88148762598fada4ad27e9beff2

Zhao, Y., Guo, M., Chen, X., Sun, J., & Qiu, J. (2024). Attention-Based CNN Fusion Model for Emotion Recognition during Walking Using Discrete Wavelet Transform on EEG and Inertial Signals. Big Data Mining and Analytics, 7(1), 188–204. Scopus. https://doi.org/10.26599/BDMA.2023.9020018

Authors

Apriyanto Apriyanto

Politeknik Tunas Pemuda

irapriyanto0604@gmail.com (Primary Contact)

Rohmat Sahirin

Universitas Pendidikan Indonesia

Snyder Bradford

International University of Monaco

Apriyanto, A., Sahirin, R. ., & Bradford, S. (2024). Implementation of Deep Learning in a Voice Recognition System for Virtual Assistants. Journal of Computer Science Advancements, 2(6), 349–363. https://doi.org/10.70177/jsca.v2i6.1533

Download Citation

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Implementation of Deep Learning in a Voice Recognition System for Virtual Assistants

Abstract

Full text article

References

Authors

Most read articles by the same author(s)

Related Article based on the article keywords

Interpretation of Deep Learning Models in Natural Language Processing for Misinformation Detection with the Explainable AI (XAI) Approach

Implementation of Neural Key Generation Algorithm For IoT Devices

Address

Contact Info

Article Sidebar

Abstract

Full text article

References

Authors

Article Details

Most read articles by the same author(s)

Related Article based on the article keywords

Interpretation of Deep Learning Models in Natural Language Processing for Misinformation Detection with the Explainable AI (XAI) Approach

Implementation of Neural Key Generation Algorithm For IoT Devices