Neural Models for Documents with Metadata

Dallas Card, Chenhao Tan, and Noah A. Smith.
In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL'2018).

Abstract:
Most real-world document collections involve various types of metadata, such as author, source, and date, and yet the most commonly-used approaches to modeling text corpora ignore this information. While specialized models have been developed for particular applications, few are widely used in practice, as customization typically requires derivation of a custom inference algorithm. In this paper, we build on recent advances in variational inference methods and propose a general neural framework, based on topic models, to enable flexible incorporation of metadata and allow for rapid exploration of alternative models. Our approach achieves strong performance, with a manageable tradeoff between perplexity, coherence, and sparsity. Finally, we demonstrate the potential of our framework through an exploration of a corpus of articles about US immigration.

[PDF][Code][Supplementary material]

Immigration temporal dynamics.

@inproceedings{card+etal:18,
     author = {Dallas Card and Chenhao Tan and Noah A. Smith},
     title = {Neural Models for Documents with Metadata},
     year = {2018},
     booktitle = {Proceedings of ACL}
}