O‘ZBEK TILI ASOSIDA QORAQALPOQ TILINING NLP TIZIMINI YARATISH

Authors

  • Otemisov Aziz Zarlıqbaevich Author
  • Xudaybergenova Gozzal Kenesbay qizi Author

Keywords:

natural language processing (NLP), artificial intelligence (AI), machine translation, transfer learning, morphological analysis, syntactic analysis, parallel corpus, lemmatizer, tokenizer.

Abstract

This article presents a comparative analysis of natural language processing (NLP) technologies for the Uzbek and Karakalpak languages. First, it examines existing NLP projects for the Uzbek language, particularly the potential of models such as UzBERT and BBPOS, as well as the contribution of linguistic corpora (UZCorpus, Universal Dependencies) to semantic, syntactic, and morphological analysis. The study highlights the lack of scientific and technical resources for NLP in the Karakalpak language, especially in terms of lemmatized and annotated corpora and analytical tools. Furthermore, practical solutions are proposed, including the creation of parallel corpora, fine-tuning of existing models, and the adaptation of lemmatizers and tokenizers.

References

1. Mansurov Sh. (2021). UzBERT: Pretraining a BERT model for Uzbek. arXiv:2108.09814.

2. Bobojonova M. va boshqalar. (2023). BBPOS: O‘zbek tilida so‘z turkumlarini aniqlovchi neyron tarmoq modeli.

3. Mamasaidov I., Shopulatov Z. (2022). Open Language Data for Low-Resource Turkic Languages.

4. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

5. Tiedemann, J. (2012). Parallel Data, Tools and Interfaces in OPUS. In Proceedings of LREC 2012.

6. Mengliev, D., Barakhnin, V., & Abdurakhmonova, N. (2021). Development of intellectual web system for morph analyzing of uzbek words. Applied Sciences, 11(19), 9117.

7. Abdurakhmonova, N. (2019). Dependency parsing based on Uzbek Corpus. In of the International Conference on Language Technologies for All (LT4All).

Downloads

Published

2025-08-04

Similar Articles

91-100 of 120

You may also start an advanced similarity search for this article.