An Integrated Multi-Clustering Approach for Patient Stratification and Disease Prediction Using Electronic Medical Records in Vector-Borne Diseases

Muthukumaran, S and Nivathra, V and Arivazhagan, P (2025) An Integrated Multi-Clustering Approach for Patient Stratification and Disease Prediction Using Electronic Medical Records in Vector-Borne Diseases. In: National Conference on NextGen Computing and Future Technologies, 10.10.2025, VISTAS.

[thumbnail of VISTAS-NCNCFT-2025-muthu.pdf] Text
VISTAS-NCNCFT-2025-muthu.pdf - Published Version

Download (133kB)

Abstract

An electronic medical record comprises both structured and unstructured pieces of patient
information, including but not limited to personal identifiers, vital signs, lab test results, physician's
notes for diagnosis, and prescriptions. Clustering techniques are one means to EMR data to cluster
patients having similar symptoms based on their laboratory results and to identify anomalies, for
example, due to faulty data entry. Objective: However, the very large numbers of records and the
extremely high dimensionality of EMR datasets make the computational burden really substantial in
developing machine learning models for predicting diseases. Methods: In this regard, this research
anticipates the communication of an Integrated Multi-Clustering Algorithm (IMCA) that combines
and integrates all the three clustering methods: K-Means, Agglomerative, and DBSCAN clustering
algorithms. A dataset on Vector borne Diseases for the study consists of 64 manifestations in relation to
11 different types of fever. The output of an individual clustering algorithm in the IMCA was then
evaluated on a personal basis to classify patients with like disease conditions or symptoms and to assist
a clinician in identifying subgroups benefiting from specific effective treatments. Results: Calinski-
Harabasz Index (CHI), Davies-Bouldin Index (DBI), and Silhouette Score are various measures for
evaluating the clustering algorithms' performance. The results claimed that clustering by K-means was
producing more balanced clusters with a more favourable CHI value but slightly lower compactness
compared with Agglomerative and DBSCAN methods.

Item Type: Conference or Workshop Item (Paper)
Subjects: Computer Science Engineering > Artificial Intelligence
Domains: Allied Health Sciences
Depositing User: Mr IR Admin
Date Deposited: 19 May 2026 03:47
Last Modified: 19 May 2026 03:47
URI: https://ir.vistas.ac.in/id/eprint/20172

Actions (login required)

View Item
View Item