O‘ZBEK TILI MATNLARINI STEMLASH ALGORITMLARI

Sharipov Maqsud Siddiqovich; Sattarova Surayyo Beknazarovna

Authors

Sharipov Maqsud Siddiqovich Author
Sattarova Surayyo Beknazarovna Author

Keywords:

stemming, Uzbek language, Snowball algorithm, natural language processing, word normalization, text analysis.

Abstract

This thesis discusses one of the key concepts in natural language processing — stemming, which is the process of reducing words to their root form. Stemming plays a crucial role in areas such as automatic text analysis, search engines, translation systems, and document classification. While there are effective stemming algorithms for English, Russian, and German, there is still a lack of efficient stemmers for the Uzbek language. Therefore, this paper theoretically explores the idea of creating an Uzbek stemmer based on the Snowball algorithm. The differences between stemming, tokenization, and lemmatization are highlighted, and various application areas in Uzbek — such as chatbots, machine translation, text analysis, social media monitoring, and digital dictionary creation — are discussed. The thesis also addresses the future prospects of developing high-performance stemmers for Uzbek and integrating them into artificial intelligence systems.

References

1. Jalil, M. M., et al. (2017). The development of the Uzbek stemming algorithm. Advanced Science Letters, 23(5), 4171–4174.

2. Sharipov, M., & Yuldashov, O. (2022). Uzbekstemmer: Development of a rule-based stemming algorithm for Uzbek language. arXiv preprint arXiv:2210.16011.

3. Sharipov, M., & Salaev, U. (2022). Uzbek affix finite state machine for stemming. arXiv preprint arXiv:2205.10078.

4. Boltayevich, E. B., et al. (2023). The problem of POS tagging and stemming for agglutinative languages (Turkish, Uyghur, Uzbek languages). In 2023 8th International Conference on Computer Science and Engineering (UBMK) (pp. 57–62). IEEE.

5. Abjalova, M., Adalı, E., & Adilova, M. (2024). The process of lemmatization and stemming in the automatic morphological analysis of Uzbek texts. In 2024 9th International Conference on Computer Science and Engineering (UBMK) (pp. 1–6). IEEE.

6. Sharipov, Maksud, and Ogabek Sobirov. "Development of a rule-based lemmatization algorithm through Finite State Machine for Uzbek language." arXiv preprint arXiv:2210.16006 (2022).

7. Izatovich B. I. Development of a stemming algorithm based on a linguistic approach for words of the uzbek language //E-Conference Globe. – 2021. – С. 195-202.

8. Tukeyev U. et al. Computational Model of Morphology and Stemming of Uzbek Words on Complete Set of Endings //2024 IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering (PIERE). – IEEE, 2024. – С. 1760-1764.

9. Ismailov A. S., Abdurakhmonova N. The development of Alisher stemmer for Uzbek Language //Science and Education. – 2022. – Т. 3. – №. 4. – С. 187-213.

10. Sattarova S. B., Bekchanova F. X., Shermetov A. K. Terminologik lug’at yaratish texnologiyasi va uning ta’lim tizimidagi ahamiyati //Academic research in educational sciences. – 2023. – Т. 4. – №. 5. – С. 422-434.

11. Madatov K. A., Sattarova S. Creation of a Corpus for Determining the Intellectual Potential of Primary School Students //2024 IEEE 25th International Conference of Young Professionals in Electron Devices and Materials (EDM). – IEEE, 2024. – С. 2420-2423.

12. Sharipov M., Salaev U., Matlatipov G. Oʻzbek tili fe’l soʻz turkumi uchun chekli avtomatlar asosida stemming algoritmini yaratish //Computer linguistics: problems, solutions, prospects. – 2021. – Т. 1. – №. 1.

O‘ZBEK TILI MATNLARINI STEMLASH ALGORITMLARI

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

Similar Articles

Most read articles by the same author(s)

Language