MODELING MORPHOLOGICAL ANALYSIS BASED ON WORD-ENDING FOR UZBEK LANGUAGE

18.11.2023 International Scientific Journal "Science and Innovation". Series C. Volume 2 Issue 11

Ulugbek Salaev

Abstract. Uzbek, an agglutinative language, forms words by combining affixes with roots, utilizing inflectional endings for various morphological features. This property makes a large number of combinations of word ending, and greatly increases the word-vocabulary size, and data sparseness problems for statistical models. This paper discusses a morphological analyzing model which includes stemming, lemmatizing and extraction of morphological information considering morpho-phonetic exceptions. A main point of the model involves developing a complete set of word-ending with assign morphological information, and additional datasets for morphological analysis. The proposed model was evaluated using a curated test set comprising 5.3K words. It achieved a word-level accuracy over 91%, as determined through manual verification of stem, lemma, and morphological feature corrections conducted by linguistic experts. The created tool based on the proposed methodology is available as an open-source Python package, as well as a web-based application including a public API

Keywords: uzbek language, morphological analyzing, morphological segmentation, stemming, lemmatizing, bound morpheme, inflectional ending