Қазақ сөйлеулерін тануда Transformer моделінің жетілдірген түрлерін қолдану ерекшеліктері

Тұрдыбек Құрметқан

FEATURES OF USING EXTENDED FORMS OF THE TRANSFORMER MODEL IN KAZAKH SPEECH RECOGNITION

Authors

Name	Affiliation
Тұрдыбек Құрметқан	Әл-фараби атындағы ҚазҰУ

Published:

2025-07-06

Issue:

No. 2 (2025): "Вестник ВКТУ им.Д.Серикбаева"

Section:

Information and communication technologies

Article language:

Kazakh

Views:

44

Keywords:

automatic speech recognition, converter, Conformer, Hiformer, convolutional neural network, deep learning, Kazakh speech recognition.

Abstract

Our article provides an overview of automatic speech recognition (ASR) technologies and models used in Kazakh speech recognition. The use of two improved types of the Transformer model - the Conformer and Hiformer models in Kazakh speech recognition - is described. The structure of the Conformer and Hiformer models is described and the architecture is presented. To determine the effectiveness of this model, it was compared with the Transformer and CNN models, which were previously used in Kazakh speech recognition. According to the results of the study, the Conformer model outperformed the Transformer and CNN models in recognizing Kazakh speech, and the Hiformer model achieved a significantly higher result than our previous study. During testing using the Hiformer model, which achieved the highest performance in recognizing Kazakh speech, the word error rate (WER) decreased from 6.5 to 11.4 percent.

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Issue №2 (2025)

Download Citation

Құрметқан, Т. (2025). Features of using extended forms of the Transformer model in Kazakh speech recognition. D. Serikbayev EKTU Bulletin, (2). Retrieved from https://journals.ektu.kz/vestnik/article/view/1040