Sentiment Classification of Documents in Serbian: The Effects of Morphological Normalization and Word Embeddings


An open issue in the sentiment classification of texts written in Serbian is the effect of different forms of morphological normalization and the usefulness of leveraging large amounts of unlabeled texts. In this paper, we assess the impact of lemmatizers and stemmers for Serbian on classifiers trained and evaluated on the Serbian Movie Review Dataset. We also consider the effectiveness of using word embeddings, generated from a large unlabeled corpus, as classification features.

Telfor Journal, Vol. 9, No. 2, pp. 104-109