The
SentiComments.SR dataset includes the following three corpora of short texts annotated for the task of sentiment analysis:
The main SentiComments.SR corpus, consisting of 3490 movie-related comments;
The movie verification corpus, consisting of 464 movie-related comments;
The book verification corpus, consisting of 173 book-related comments.
Six sentiment labels were used in dataset annotation: +1, -1, +M, -M, +NS, and -NS, with the addition of an ‘s’ label suffix denoting the presence of sarcasm. The main corpus was annotated by two annotators working together, and therefore contains a single, unified sentiment label for each comment. The verification corpora were used to evaluate the quality, efficiency, and cost-effectiveness of the annotation framework, which is why they contain separate sentiment labels for six annotators. The construction of this dataset is described in the
2020 PLoS ONE paper.