KORPUS MATNLARINI TEMATIK MODELLASHTIRISHDAGI ZAMONAVIY YONDASHUVLAR

Authors

  • Aloyev Narzillo Raxmatilloyevich Author

Keywords:

Topic Modeling, Machine Learning, Latent Dirichlet Analysis (LDA), Linguistic Corpus, Topic Classification.

Abstract

Natural language processing (NLP) is mainly used in text processing and used to develop applications such as chatbots, automatic editing and analysis, speech recognition, automatic translation, social network monitoring and email filtering. In NLP, topic modeling is a set of algorithms that can be used to automatically summarize large amounts of text. Due to the large number of text features, it becomes difficult to train the models and reduces the performance of the models. Today, topic modeling and dimension reduction in language corpus texts can be performed with various algorithms such as: LDA, Non-negative Matrix Factorization, LSA, PLSA, Lda2Vec, Bert Topic-BERT. This article provides information on the description of topic modeling, areas of application, methods, and approaches, and topics are identified by intellectual processing of language corpus texts through the LDA method and visualized by means of the t-SNE method.

References

1. B.ELov, Sh.Khamroeva, Z.Xusainova (2023). The pipeline processing of NLP. E3S Web of Conferences 413, 03011, INTERAGROMASH 2023. https://doi.org/10.1051/e3sconf/202341303011

2. Elov, B., Xusainova, Z., & Berdiyeva, H. (2023). The Problem of Words Undergoing Sound Changes in Uzbek Stemmers. Central Asian Journal of Literature, Philosophy and Culture, 4(6), 107-114. Retrieved from https://cajlpc.centralasianstudies.org/index.php/CAJLPC/article/view/905

3. B.Elov, Sh.Hamroyeva, X.Axmedova. Methods for creating a morphological analyzer. 14th International Conference on Intellegent Human Computer Interaction, IHCI 2022, 19-23 October 2022, Tashkent. https://dx.doi.org/10.1007/978-3-031-27199-1_4 4. Elov, B., Hamroyeva, S., Alayev, R., Xusainova, Z., & Yodgorov, U. (2023). O‘ZBEK TILI KORPUSI MATNLARINI QAYTA ISHLASH USULLARI. DIGITAL TRANSFORMATION AND ARTIFICIAL INTELLIGENCE, 1(3), 117–129. Retrieved from https://dtai.tsue.uz/index.php/dtai/article/view/v1i317

5. B.Elov, E.Adalı, Sh.Khamroeva, O.Abdullayeva, Z.Xusainova, N.Xudayberganov (2023). The Problem of Pos Tagging and Stemming for Agglutinative Languages. 8 th International Conference on Computer Science and Engineering UBMK 2023, Mehmet Akif Ersoy University, Burdur – Turkey.

6. Z.Xusainova. O‘zbek tili milliy korpusi qidiruv tizimini optimallashtirishda lemmatizatsiyadan foydalanish. Oʻzbekiston: Til va Madaniyat (Kompyuter lingvistikasi), 2023, 2(6). ISSN 2181-922X

7. B.Elov, Sh..Hamroyeva, Z.Xusainova. Oʻzbek tilidagi turli tuzilishli soʻzlarni lemmalash usullari. O‘zbek tili milliy va ta’limiy korpusining nazariy va amaliy masalalari mavzusidagi respublika ilmiy-amaliy konferensiyasi materiallari, 2023-yil 5-may

8. B.Elov, Z.Xusainova, N.Xudayberganov. O‘zbek tili korpusi matnlari uchun TF-IDF statistik ko‘rsatkichni hisoblash. SCIENCE AND INNOVATION INTERNATIONAL SCIENTIFIC JOURNAL VOLUME 1 ISSUE 8 UIF-2022: 8.2 | ISSN: 2181-3337

9. Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications,78(11). https://doi.org/10.1007/s11042-018-6894-4

10. Kherwa, P., & Bansal, P. (2020). Topic Modeling: A Comprehensive Review. EAI Endorsed Transactions on Scalable Information Systems, 7(24). https://doi.org/10.4108/eai.13-7-2018.159623

11. Wallach, H. M., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation methods for topic models. ACM International Conference Proceeding Series, 382. https://doi.org/10.1145/1553374.1553515

12. Korencic, D., Ristov, S., Repar, J., & Snajder, J. (2021). A Topic Coverage Approach to Evaluation of Topic Models. IEEE Access, 9. https://doi.org/10.1109/ACCESS.2021.3109425

Downloads

Published

2025-08-04

Similar Articles

81-90 of 112

You may also start an advanced similarity search for this article.