MACHINE LEARNING MODELLARI ASOSIDA O‘ZBEK TILIDAGI AUDIOMATNLARNI TAHLIL QILISH: KUN.UZ MISOLIDA

Urazaliyeva Mavluda Yangiboyevna

Authors

Urazaliyeva Mavluda Yangiboyevna Author

Keywords:

audio text, automatic transcription, phonetic features, morphemic analysis, ASR models, WER, CER.

Abstract

This article provides an in-depth scientific analysis of the phonetic and grammatical accuracy of machine learning models that automatically transcribe audio texts in the Uzbek language. The study comparatively evaluated the performance of Whisper, wav2vec 2.0, CTC, and Seq2Seq models based on oral speech samples - audio news from the kun.uz website for 2023-2024. The effectiveness of each model was assessed according to Word Error Rate (WER) and Character Error Rate (CER) criteria, taking into account the phonetic complexity of the Uzbek language, its system of morphological affixes, and stress variations. The obtained results were analyzed, highlighting the advantages and weaknesses of each model. Furthermore, the article emphasizes the relevance of developing models adapted to the Uzbek language, based on comparisons with international ASR corpora, supported by scientifically grounded conclusions.

References

1. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. OpenAI. https://openai.com/research/whisper

2. Baevski, A., Zhou, H., Mohamed, A., & Auli, M. (2020). wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. arXiv preprint arXiv:2006.11477. https://arxiv.org/abs/2006.11477

3. Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006). Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. Proceedings of the 23rd International Conference on Machine Learning (ICML), 369–376. https://doi.org/10.1145/1143844.1143891

4. Abdurakhmonova, N. (2021). Formal-Functional Models of The Uzbek Electron Corpus. ANGLISTICUM. Journal of the Association-Institute for English Language and American Studies, 10(8), 59-66.

5. Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. ICLR. https://arxiv.org/abs/1409.0473

6. Jurafsky, D., & Martin, J. H. (2020). *Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition* (3rd ed.). New Jersey: Prentice Hall. ISBN: 978-0131873216

7. Халматова, Н., Саидов, У., Мирзарахимова, З. (2022). Агглютинатив тилларда морфологик таҳлилнинг муаммолари. Тошкент: Фан нашриёти.

8. Kun.uz (2023–2024). O‘zbekistonda rasmiy axborot yangiliklari va audio-matnlar arxivi. Toshkent: Kun.uz axborot xizmati. https://kun.uz

9. Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). Librispeech: An ASR corpus based on public domain audio books. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5206–5210. https://doi.org/10.1109/ICASSP.2015.7178964

10. Rousseau, A., Deléglise, P., & Estève, Y. (2014). Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 3935–3939.

11. Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., & Dahlgren, N. L. (1993). DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CDROM. Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC93S1

12. Kubedinova L. Khusainov A., Suleymanov D., Gilmullin R., Abdurakhmonova N. First Results of the TurkLang-7 Project: Creating Russian-Turkic Parallel Corpora and MT Systems. Proceedings of the Computational Models in Language and Speech Workshop (CMLS 2020) co-located with 16th International Conference on Computational and Cognitive Linguistics (TEL 2020) .2020/11: 90-101.

13. Abdurakhmonova N. Dependency parsing based on Uzbek Corpus. InProceedings of the International Conference on Language Technologies for All (LT4All) 2019.

14. N. Abdurakhmonova, U. Tuliyev and A. Gatiatullin, "Linguistic functionality of Uzbek Electron Corpus: uzbekcorpus.uz," 2021 International Conference on Information Science and Communications Technologies (ICISCT), 2021, pp. 1-4,

http://10.1109/ICISCT52966.2021.9670043

MACHINE LEARNING MODELLARI ASOSIDA O‘ZBEK TILIDAGI AUDIOMATNLARNI TAHLIL QILISH: KUN.UZ MISOLIDA

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

Similar Articles

Language