The Serbian STS News Corpus (STS.news.sr)

Go to Dataset/Tool Site

The Serbian Semantic Textual Similarity News Corpus – STS.news.sr (ISLRN 146-979-597-345-4) consists of 1192 pairs of sentences in Serbian gathered from news sources on the web. Each sentence pair was manually annotated with fine-grained semantic similarity scores on the 0–5 scale. The final scores were obtained by averaging the individual scores of five annotators. The construction of this corpus is described in the LREC 2018 paper.