Sentiment Analysis

The Serbian Movie Review Dataset (SerbMR)

The Serbian Movie Review Dataset (SerbMR) collection consists of three movie review datasets in Serbian which were constructed for the task of sentiment analysis:
Collected movie reviews in Serbian (ISLRN 252-457-966-231-5) – an unbalanced collection of 4725 movie reviews in Serbian.
SerbMR-2C – The Serbian Movie Review Dataset (2 Classes) (ISLRN 016-049-192-514-1) – a two-class balanced sentiment analysis dataset containing 1682 movie reviews in Serbian (841 positive and 841 negative reviews).
SerbMR-3C – The Serbian Movie Review Dataset (3 Classes) (ISLRN 229-533-271-984-0) – a three-class balanced sentiment analysis dataset containing 2523 movie reviews in Serbian (841 positive, 841 neutral, and 841 negative reviews).
The construction of this dataset collection is described in the LREC 2016 paper.

NBSVM-Weka – a multiclass implementation of the NBSVM classifier for Weka

NBSVM is an algorithm, originally designed for binary text/sentiment classification, which combines the Multinomial Naive Bayes (MNB) classifier with the Support Vector Machine (SVM). It does so through the element-wise multiplication of standard SVM feature vectors by the positive class/negative class ratios of MNB log-counts.
This implementation extends the original algorithm to support multiclass classification using the one-vs-all approach. It relies on the LIBLINEAR library and its Java wrapper and is designed as a package for Weka. NBSVM-Weka was presented in the LREC 2016 paper.