Unsupervised learning by probabilistic latent semantic analysis

Authors
Citation
T. Hofmann, Unsupervised learning by probabilistic latent semantic analysis, MACH LEARN, 42(1-2), 2001, pp. 177-196
Citations number
24
Language
INGLESE
art.tipo
Article
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
MACHINE LEARNING
ISSN journal
0885-6125 → ACNP
Volume
42
Issue
1-2
Year of publication
2001
Pages
177 - 196
Database
ISI
SICI code
0885-6125(200101)42:1-2<177:ULBPLS>2.0.ZU;2-N
Abstract
This paper presents a novel statistical method for factor analysis of binar y and count data which is closely related to a technique known as Latent Se mantic Analysis. In contrast to the latter method which stems from linear a lgebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed technique uses a generative latent class model to perform a p robabilistic mixture decomposition. This results in a more principled appro ach with a solid foundation in statistical inference. More precisely, we pr opose to make use of a temperature controlled version of the Expectation Ma ximization algorithm for model fitting, which has shown excellent performan ce in practice. Probabilistic Latent Semantic Analysis has many application s, most prominently in information retrieval, natural language processing, machine learning from text, and in related areas. The paper presents perple xity results for different types of text and linguistic data collections an d discusses an application in automated document indexing. The experiments indicate substantial and consistent improvements of the probabilistic metho d over standard Latent Semantic Analysis.