FEATURES OF USING EXTENDED FORMS OF THE TRANSFORMER MODEL IN KAZAKH SPEECH RECOGNITION
Published:
2025-07-06Article language:
KazakhViews:
20Keywords:
automatic speech recognition, converter, Conformer, Hiformer, convolutional neural network, deep learning, Kazakh speech recognition.Abstract
Our article provides an overview of automatic speech recognition (ASR) technologies and models used in Kazakh speech recognition. The use of two improved types of the Transformer model - the Conformer and Hiformer models in Kazakh speech recognition - is described. The structure of the Conformer and Hiformer models is described and the architecture is presented. To determine the effectiveness of this model, it was compared with the Transformer and CNN models, which were previously used in Kazakh speech recognition. According to the results of the study, the Conformer model outperformed the Transformer and CNN models in recognizing Kazakh speech, and the Hiformer model achieved a significantly higher result than our previous study. During testing using the Hiformer model, which achieved the highest performance in recognizing Kazakh speech, the word error rate (WER) decreased from 6.5 to 11.4 percent.
License
Copyright (c) 2025 ШҚТУ Хабаршысы
This work is licensed under a Creative Commons Attribution 4.0 International License.