Evaluation and Classification of Syntax Usage in Determining Short-Text Semantic Similarity

Apstrakt

This paper outlines and categorizes ways of using syntactic information in a number of algorithms for determining the semantic similarity of short texts. We consider the use of word order information, part-of-speech tagging, parsing and semantic role labeling. We analyze and evaluate the effects of syntax usage on algorithm performance by utilizing the results of a paraphrase detection test on the Microsoft Research Paraphrase Corpus. We also propose a new classification of algorithms based on their applicability to languages with scarce natural language processing tools.

Publikacija
Telfor Journal, Vol. 6, No. 1, pp. 64-68
Datum