Automated Data Quality Scoring using Statistical Drift Detection and Rule-based Semantic Constraints

Nisha Dayana, T R and Jeyashree, R and Arivazhagan, P and Prathiba, P and Keerthana, K (2026) Automated Data Quality Scoring using Statistical Drift Detection and Rule-based Semantic Constraints. In: 9th International Conference on Inventive Computation Technologies (ICICT-2026), 15-17 APRIL 2026, NEPAL.

[thumbnail of 67.pdf] Text
67.pdf

Download (466kB)

Abstract

High-quality data is impractical when it comes to analytics, machine learning and operational decisions. However, traditional data-quality frameworks have up to now been constrained in the ingenuity of representing changes in data distributions through time and inconsistencies of term interpretations. Existing solutions usually look at structural profiling or univariate checks which create holes in finding evolving drift or rule violations which can lead to compromising downstream applications. This study is aimed to develop a hybrid framework to combine the statistical drift detection to the rule-based semantic constraint to provide the data quality scoring that is interpretable and adaptive. The methodology proposed a multi-step approach, including baseline profiling, statistical drift monitoring, rule based semantic, integrity and composite-score with adaptive-feedback. Experiments were performed on a semi-synthetic healthcare claims data set with injected covariate, label and concept drift scenarios. The proposed StatDrift + RuleFusion method achieved better performance with an AUROC of 0.94, AUPR of 0.88, F1 score of 0.86, Brier score of 0.045, FNR of 0.07, and FDR of 0.05 and outperformed five state-of-the-art methods. The developed framework is able to bring together the results of statistical and semantic evaluation in order to effectively detect, quantify and communicate quality issues in the data in real time, which provides a scalable and interpretable solution for modern data governance.

Item Type: Conference or Workshop Item (Paper)
Subjects: Computer Applications > Artificial Intelligence
Domains: Computer Science
Depositing User: Mr IR Admin
Date Deposited: 07 May 2026 12:38
Last Modified: 16 May 2026 10:30
URI: https://ir.vistas.ac.in/id/eprint/13943

Actions (login required)

View Item
View Item