Cornell Statement Strength Dataset v1.0 (released April 2014) Distributed together with: A Corpus of Sentence-level Revisions in Academic Writing: A Step towards Understanding Statement Strength in Communication Chenhao Tan, Lillian Lee In Proceedings of ACL (short papers), 2014 The paper, data, and associated materials can be found at: http://chenhaot.com/pages/statement-strength.html If you use this data, please cite: @inproceedings{tan+lee:14, author = {Chenhao Tan and Lillian Lee}, title = {A Corpus of Sentence-level Revisions in Academic Writing: A Step towards Understanding Statement Strength in Communication}, year = {2014}, booktitle = {Proceedings of ACL (short papers)} } Files description: This dataset contains two files. matched_sentences.csv: 108,678 aligned sentence pairs from scientific paper abstracts or introductions where the similarity score for the pair was larger than 0.5. Each line contains the arXiv id of the paper, which section the pair is from, and the two sentences of the pair. Arxiv papers can be retrieved from http://arxiv.org . turker_labels.txt: Blank lines separate 500 groups of 11 lines. Each 11-line group has the following format: First two lines: the two sentences in a pair Next nine lines each have the following format: id: (label) comment The id is a anonymized numerical id for the labelers, employed through Amazon Mechanical Turk. Please email any questions to: chenhao@chenhaot.com