To Each His Own: Personalized Content Selection based on Text Comprehensibility

Chenhao Tan, Evgeniy Gabrilovich, Bo Pang
In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM'2012)

Imagine a physician and a patient doing a search on antibiotic resistance. Or a chess amateur and a grandmaster conducting a search on Alekhine’s Defence. Although the topic is the same, arguably the two users in each case will satisfy their information needs with very different texts. Yet today search engines mostly adopt the onesize-fits-all solution, where personalization is restricted to topical preference. We found that users do not uniformly prefer simple texts, and that the text comprehensibility level should match the user’s level of preparedness. Consequently, we propose to model the comprehensibility of texts as well as the users’ reading proficiency in order to better explain how different users choose content for further exploration. We also model topic-specific reading proficiency, which allows us to better explain why a physician might choose to read sophisticated medical articles yet simple descriptions of SLR cameras. We explore different ways to build user profiles, and use collaborative filtering techniques to overcome data sparsity. We conducted experiments on large-scale datasets from a major Web search engine and a community question answering forum. Our findings confirm that explicitly modeling text comprehensibility can significantly improve content ranking (search results or answers, respectively).

[Slides][PDF]

@inproceedings{tan+gabrilovich+pang:12,
     author = {Chenhao Tan and Evgeniy Gabrilovich and Bo Pang},
     title = {To Each His Own: Personalized Content Selection based on Text Comprehensibility},
     year = {2012},
     booktitle = {Proceedings of WSDM}
}