Friendships, Rivalries, and Trysts: Characterizing Relations between Ideas in Texts

Chenhao Tan, Dallas Card, Noah A. Smith
In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL'2017)

Understanding how ideas relate to each other is a fundamental question in many domains, ranging from intellectual history to public communication. Because ideas are naturally embedded in texts, we propose the first framework to systematically characterize the relations between ideas based on their occurrence in a corpus of documents, independent of how these ideas are represented. Combining two statistics --- cooccurrence within documents and prevalence correlation over time --- our approach reveals a number of different ways in which ideas can cooperate and compete. For instance, two ideas can closely track each other's prevalence over time, and yet rarely cooccur, almost like a "cold war" scenario. We observe that pairwise cooccurrence and prevalence correlation exhibit different distributions. We further demonstrate that our approach is able to uncover intriguing relations between ideas through in-depth case studies on news articles and research papers.

[PDF] [Slides] [blog] [Supplementary material] [Code] [Visualization tool] [ACL Data(README)] [NIPS Data(README)]

Example images.

     author = {Chenhao Tan and Dallas Card and Noah A. Smith},
     title = {Friendships, Rivalries, and Trysts: Characterizing Relations between Ideas in Texts},
     year = {2017},
     booktitle = {Proceedings of ACL}