Text Clustering on CCSI System using Canopy and K-Means Algorithm

Akila, D. and Raja, S. R. and S, Jeyalaksshmi and Revathi, M. and Ashfaq, Farzeen and Khan, A. A. (2024) Text Clustering on CCSI System using Canopy and K-Means Algorithm. In: 2024 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC), Windhoek, Namibia.

Full text not available from this repository. (Request a copy)

Abstract

Text mining programs often offer numerous options for processing textual data. The inclusion of extensive meta-data-such as links to other websites, titles, authors, and publication dates-can enrich text analysis and aid in text grouping. However, this side information, or “metadata,” can also introduce noise if not handled correctly. When used indiscriminately for clustering, this noise can degrade the quality of the resulting clusters. To address this challenge, we propose a methodology that leverages a Feature Selection procedure to identify and utilize metadata that enhances clustering effectiveness. The methodology, known as Co-Clustering with Side Information (C-CSI), is designed to maximize the benefits of relevant metadata while mitigating the effects of irrelevant or noisy data. C-CSI is implemented using datamining technologies, specifically focusing on Co-Clustering, which involves clustering both rows and columns of a matrix. In our study, we employed the canopy and k-means algorithms to perform the clustering tasks. This approach ensures that the metadata used in clustering is both relevant and beneficial, improving the overall quality of the clustering results

Item Type: Conference or Workshop Item (Paper)
Subjects: Computer Applications > Computer Science
Domains: Computer Applications
Depositing User: Mr IR Admin
Date Deposited: 31 Aug 2025 09:59
Last Modified: 31 Aug 2025 09:59
URI: https://ir.vistas.ac.in/id/eprint/10902

Actions (login required)

View Item
View Item