Text Classification

SCStemmers – A collection of stemmers for Serbian and Croatian

SCStemmers is a package containing four stemming algorithms for Serbian and Croatian:
– The greedy and the optimal subsumption-based stemmers for Serbian, by Vlado Kešelj and Danko Šipka,
– A refinement of their greedy stemmer for Serbian, by Nikola Milošević,
– A stemmer for Croatian, by Nikola Ljubešić and Ivan Pandžić.
SCStemmers can be used as a standalone tool or as a plug-in for Weka. The package was presented in the LREC 2016 paper.

NBSVM-Weka – a multiclass implementation of the NBSVM classifier for Weka

NBSVM is an algorithm, originally designed for binary text/sentiment classification, which combines the Multinomial Naive Bayes (MNB) classifier with the Support Vector Machine (SVM). It does so through the element-wise multiplication of standard SVM feature vectors by the positive class/negative class ratios of MNB log-counts.
This implementation extends the original algorithm to support multiclass classification using the one-vs-all approach. It relies on the LIBLINEAR library and its Java wrapper and is designed as a package for Weka. NBSVM-Weka was presented in the LREC 2016 paper.