Latent Dirichlet Allocation for Topic Discovery and Segmentation in Big Data

Dimensions

Clementking, A. and Rani, S. and Roseline, R. and K E, Purushothaman and Kavitha, G. and Murugan, S. (2024) Latent Dirichlet Allocation for Topic Discovery and Segmentation in Big Data. In: 2024 Global Conference on Communications and Information Technologies (GCCIT), BANGALORE, India.

Full text not available from this repository. (Request a copy)

Official URL: https://doi.org/10.1109/GCCIT63234.2024.10862564

Abstract

Using Latent Dirichlet Allocation (LDA) for topic identification and segmentation in big data helps identify significant patterns and topics from large text corpora. LDA will be implemented and optimized to rapidly process and analyze large datasets, revealing hidden subjects and enhancing content structure. Creating a strong framework for accurate and scalable subject modeling would improve analysis and decision-making in social media analytics, consumer feedback, and academic research. The LDA technique must be refined to accommodate large data's great dimensionality and complexity while being computationally efficient. An innovative topic identification tool that processes large-scale text data quickly and reliably will reveal theme patterns and improve big data management and use. The Bigdata Corpus results demonstrate the results for Topic Distribution Across Documents in a sample of 5 topics and 5 documents vary from 0.1 to 0.25. The same dataset also has Top Words per Topic. 10 example words for 10 subjects the identical dataset with another instance has values from 0.03-0.15. Document clustering based on topic proportions in a sample of 5 documents clustering 5 topics yields 0.1–0.75.

Item Type:	Conference or Workshop Item (Paper)
Subjects:	Computer Science Engineering > Big Data
Domains:	Computer Applications
Depositing User:	Mr IR Admin
Date Deposited:	22 Aug 2025 06:47
Last Modified:	22 Aug 2025 06:47
URI:	https://ir.vistas.ac.in/id/eprint/10406

Actions (login required)

: View Item