Autore
Schiavon, LorenzoTitolo
Addressing topic modelling via reduced latent space clusteringPeriodico
Statistical methods & applications : Journal of the Italian Statistical SocietyAnno:
2025 - Volume:
34 - Fascicolo:
1 - Pagina iniziale:
1 - Pagina finale:
20In the social sciences, topic modelling is gaining increased attention for its ability to automatically uncover the underlying themes within large corpora of textual data. This process typically involves two key phases: (i) identifying the words associated with language concepts, and (ii) clustering documents that share similar word distributions. In this study, motivated by the growing interest in automatic categorisation of policy documents and regulations, we leverage recent advancements in Bayesian factor models to develop a novel topic modelling approach. This enable us to represent the high-dimensional space defined by all possible observed words through a small set of latent variables, and simultaneously cluster the documents based on their distributions over these latent constructs. Here, groups and underlying constructs are interpreted as document topics and language concepts, respectively, with the number of dimensions not required in advance. Additionally, we demonstrate the effectiveness of our approach using synthetic data, providing a comparison with existing methods in the literature. The illustration of our approach on a corpus of Italian health public plans unveils intriguing patterns concerning the semantic structures used in ageing policies and document topic similarities.
SICI: 1618-2510(2025)34:1<1:ATMVRL>2.0.ZU;2-4
Testo completo:
https://link.springer.com/article/10.1007/s10260-025-00779-zEsportazione dati in Refworks (solo per utenti abilitati)
Record salvabile in Zotero
Biblioteche ACNP che possiedono il periodico