Sentiment Analysis | Vuk Batanović

SentiComments.SR - A Sentiment Analysis Dataset of Comments in Serbian

The SentiComments.SR dataset includes the following three corpora of short texts annotated for the task of sentiment analysis:
The main SentiComments.SR corpus, consisting of 3490 movie-related comments;
The movie verification corpus, consisting of 464 movie-related comments;
The book verification corpus, consisting of 173 book-related comments.
Six sentiment labels were used in dataset annotation: +1, -1, +M, -M, +NS, and -NS, with the addition of an ‘s’ label suffix denoting the presence of sarcasm. The main corpus was annotated by two annotators working together, and therefore contains a single, unified sentiment label for each comment. The verification corpora were used to evaluate the quality, efficiency, and cost-effectiveness of the annotation framework, which is why they contain separate sentiment labels for six annotators. The construction of this dataset is described in the 2020 PLoS ONE paper.

The Serbian Movie Review Dataset (SerbMR)

The Serbian Movie Review Dataset (SerbMR) collection consists of three movie review datasets in Serbian which were constructed for the task of sentiment analysis:
Collected movie reviews in Serbian (ISLRN 252-457-966-231-5) – an unbalanced collection of 4725 movie reviews in Serbian.
SerbMR-2C – The Serbian Movie Review Dataset (2 Classes) (ISLRN 016-049-192-514-1) – a two-class balanced sentiment analysis dataset containing 1682 movie reviews in Serbian (841 positive and 841 negative reviews).
SerbMR-3C – The Serbian Movie Review Dataset (3 Classes) (ISLRN 229-533-271-984-0) – a three-class balanced sentiment analysis dataset containing 2523 movie reviews in Serbian (841 positive, 841 neutral, and 841 negative reviews).
The construction of this dataset collection is described in the LREC 2016 paper.