Stemming

Serbian AutoRIA - a model for automating the RIA mechanism for Serbian

Rapid Integrated Assessment (RIA) is a national policy document evaluation mechanism developed by the UNDP to help countries assess their readiness for the implementation of UN Sustainable Development Goals (SDG). The created model automates the RIA procedure for documents written in Serbian and is based on an earlier IBM approach developed for English. The model works by searching the documents for sentences / paragraphs that are a semantic match for one the SDG targets. The model repository also contains the Serbian national policy documents, as well as their stemmed versions. Further information can be found in the LT4All paper.

SCStemmers – A collection of stemmers for Serbian and Croatian

SCStemmers is a package containing four stemming algorithms for Serbian and Croatian:
– The greedy and the optimal subsumption-based stemmers for Serbian, by Vlado Kešelj and Danko Šipka,
– A refinement of their greedy stemmer for Serbian, by Nikola Milošević,
– A stemmer for Croatian, by Nikola Ljubešić and Ivan Pandžić.
SCStemmers can be used as a standalone tool or as a plug-in for Weka. The package was presented in the LREC 2016 paper.