Efficient Document Clustering via Online Nonnegative Matrix Factorizations

Fei Wang, Chenhao Tan, Arnd Christian Konig, Ping Li
In Proceedings of the SIAM International Conference on Data Mining (SDM'2011)

In recent years, Nonnegative Matrix Factorization (NMF) has received considerable interest from the data mining and information retrieval fields. NMF has been successfully applied in document clustering, image representation, and other domains. This study proposes an online NMF (ONMF) algorithm to efficiently handle very large-scale and/or streaming datasets. Unlike conventional NMF solutions which require the entire data matrix to reside in the memory, our ONMF algorithm proceeds with one data point or one chunk of data points at a time. Experiments with one-pass and multi-pass ONMF on real datasets are presented.

[PDF]

@inproceedings{wang+etal:11,
     author = {Fei Wang and Chenhao Tan and Arnd Christian Konig and Ping Li} and
     title = {Efficient Document Clustering via Online Nonnegative Matrix Factorizations}
     year = {2011},
     booktitle = {Proceedings of SDM}
}