Hybrid feature selection technique for prediction of cardiovascular diseases

doi:10.1016/j.matpr.2021.03.225

Materials Today: Proceedings

Volume 81, Part 2, 2023, Pages 336-340

https://doi.org/10.1016/j.matpr.2021.03.225 Get rights and content

Referred to by

IVCSM 2K20 – EXPRESSION OF CONCERN – PART 4

Materials Today: Proceedings, Volume 81, Part 2, 2023, Pages 90

View PDF

Abstract

Diagnosing a disease consumes a part of time, needs high technical methods but nowadays the smart technologies have been grown rapidly in the field of healthcare industries and also it improves the routine life of the patients, reduces the amount of work, treatment cost in the health care organization. Diseases prediction is one of the major challenges faced by society nowadays. The recent survey also stated that the death rate is remarkably high in CAD because most of the people are affected by cardiovascular diseases. Prediction and diagnosis of cardiovascular diseases is very essential nowadays to reduce the death rate and diagnosing it preliminary stage itself. In earlier studies, they worked with machine learning techniques to predict the diseases, but they are not given proper attention to identifying the feature with the help of proper feature selection methods. This paper proposed a new-found feature selection technique HRFLC (RANDOM FOREST + ADABOOST + PEARSON COEFFICIENT). This method helps to predict the diseases in a very efficient manner, and it improves the accuracy level in prediction.

Introduction

Nowadays most people are affected by several diseases. Analysing and diagnosing a disease is a difficult task faced by health care organizations. A huge amount of clinical data is present in the health care organization. Without an intelligent system, it is exceedingly difficult to extract medical data from the medical organization. Data mining techniques have a huge effect over the decades in extracting information from a data set and predicting the disease of humans. A lot of new technologies have been developed but for predicting a disease data mining with machine learning techniques play a very efficient role [1]. With the help of machine learning techniques, a prediction model can be built for numerous diseases such as breast cancer, brain tumour, ovarian cancer, lung cancer, heart diseases, etc. In current research stated the most frequent diseases faced by people in all over the world is cardiovascular diseases.

In 2017, the WHO reported the death rate is extremely high in globally because of cardiovascular diseases. When we compare to the women most of the men are affected by this cardiovascular disease. To reduce the death rate, it is important to predict disease in an earlier stage. Sometimes experts can feel exceedingly difficult to identify the CAD with a different number of signs and reasons for CAD, Such as age, BP, diabetics, contraction of heartbeat, heartburn, sweating, etc. Heart diseases are different types are there with various symptoms such as CAD – a blockage in the arteries, HF- circulation of blood is not proper, CHD – heart formation is not proper in the womb [2]. Coronary Artery Disease (CAD) falls under Classification problem in machine learning and it involves three major steps 1) Exploratory Data analysis 2) feature selection and creation/prediction using model. The redundant and duplicate data set are removed during exploratory data analysis and next the important feature which contribute for the target prediction are identified using Feature Selection. In feature selection the data are classified based on ordinal and categorical data and suitable techniques are used for same. These steps help in accuracy of model and reduce the total execution time of the model as less computation is required (Fig. 1 and Fig. 2).

Feature selection plays an important role in reducing the overfitting of data as same or redundant feature makes the prediction wrong. Even if we receive the accuracy in Training data it will not work well with testing data and this issue as “variance and bias”. A good model should have low variance and low bias and to achieve it we need to use Feature selection which helps the model perform well on both training and testing data (Fig. 3).

Features selection is of 3 types 1) Filter Method 2) Wrapper Method and 3) Embedded method which is used for selecting important feature.[3] Filter Method is computationally less and not costlier as it does not involve any Machine learning model, but other methods are efficient than filter methods but are computationally costlier than Filter method. The Model used for classification need to be validated with evaluation technique. For classification problem the below are the techniques used 1) Accuracy 2) Precision 3) Sensitivity 4) Recall and 5) Kappa. Based on the Model and type of dataset we need to evaluate the model with Evaluation technique and Choose the best model based on the score obtained.

Access through your organization

Check access to the full text by signing in through your organization.

Access through your organization

Section snippets

Related work

An efficient framework was designed to diagnose a disease and to identify the subset from the large dataset was proposed by Qinglin et al. [4]. The proposed method was divided into two sections i) in method one they use GA(Genetic algorithm) to generate an initial position and GWO (GREY WOLF OPTIMIZATION) for updating the current position ii) in method two to get better classification they use KELM III)to improve the accuracy of this technique they hybrid this method (GWO + KELM). This

Background study

Various methods are employed for prediction of disease either by traditional methods or by machine learning algorithm. But the methods could achieve low accuracy, Sensitivity and Specificity. A good model should have balance between the Sensitivity and Specificity. Existing models such as Logistic regression, Recursive Feature selection, genetic algorithm and deep learning have performed to certain extent and achieved certain level of accuracy. There is much research published based on the

Proposed method HRFLC

In this paper we use new technique for Feature selection which uses combination of Random Forest, Ad boost and linear correlation based on the comparison with Target variable. Based on this feature selection technique the important features are identified and applied to different machine learning technique. This method is implemented in phyton environment to identify the significant feature from the dataset. It provides a graphical representation of the dataset, work environment and predictive

Results and discussions

Comparisons are done with feature selection techniques described in the paper and without feature selection techniques. Based on the HRFLC algorithm 11 features are selected and evaluation parameters are compared. The result show the accuracy is improved using the model proposed.

The 11 features selected by the models are

'age', 'sex', 'chest', 'resting_blood_pressure', 'serum_cholestoral','fasting_blood_sugar', 'maximum_heart_rate_achieved', 'oldpeak', 'slope', 'number_of_major_vessels', 'thal'.The below

Conclusion

Identifying and diagnosing a disease in earlier stage can save humans life. Most of the people are affected by cardiovascular diseases; prevention of the diseases is the major challenge for the health care industry. Prediction and diagnosis of cardiovascular diseases is very essential nowadays to reduce the death rate and diagnosing it preliminary stage itself. A machine learning algorithm plays a very important role in predicting the diseases and its helps to process the raw data into useful

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (15)

T. Vivekanandan et al.
A Hybrid Risk Assessment Model for Cardiovascular Disease Using Cox Regression Analysis and a 2-means clustering algorithm
Comput. Biol. Med.
(2019)
H. YAN et al.
A multilayer perceptron-based medical decision support system for heart disease diagnosis
Expert Syst. Appl.
(2006)
A.H. Shahid et al.
A Novel Approach for Coronary Artery Disease Diagnosis using Hybrid Particle Swarm Optimization based Emotional Neural Network
Biocybernetics and Biomedical Engineering
(2020)
S.M.S. Shah et al.
Support Vector Machines-based Heart Disease Diagnosis using Feature Subset, Wrapping Selection and Extraction Methods
Comput. Electr. Eng.
(2020)
Y. Khan et al.
Machine learning techniques for heart disease datasets: a survey
Pavithra, V., & Jayalakshmi, V. (2019, December). A Review on Predicting Cardiovascular Diseases Using Data Mining...
Pavithra, V., & Jayalakshmi, V. (2020, June). Review of Feature Selection Techniques for Predicting Diseases. In2020...

There are more references available in the full text version of this article.

Cited by (12)

Interpretable multidisease diagnosis and label noise detection based on a matching network and self-paced learning
2024, Pattern Recognition
With the extensive use of information systems in hospitals, a large quantity of electronic medical record data has been accumulated, which makes it possible to train clinical decision support systems based on the data. However, electronic medical records are written by doctors of different levels, which easily introduces label noise into the datasets. The lack of interpretability of current auxiliary diagnosis methods is another problem.
To address these challenges, we introduce a matching network based on medical guidelines and build an auxiliary diagnosis model based on self-paced learning. The matching network based on guidelines can provide medical knowledge beyond medical records and a certain degree of interpretability. Additionally, self-paced learning can help the model identify the label noise and prevent the model from being misled. The experiments show that our method outperforms the baselines in a Chinese medical multi-disease diagnosis dataset and the MIMIC-III dataset and has good performance in the label noise detection task.
Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization
2023, Processes
Performance Comparison of Feature Selection Methods for Prediction in Medical Data
2023, Communications in Computer and Information Science
A Hybrid Feature Selection-multidimensional LSTM Framework for Deformation Prediction of Super High Arch Dams
2022, KSCE Journal of Civil Engineering
Data-Driven Machine-Learning Methods for Diabetes Risk Prediction
2022, Sensors
Machine Learning Methods for Hypercholesterolemia Long-Term Risk Prediction
2022, Sensors

View all citing articles on Scopus

View full text

Article preview

Materials Today: Proceedings

Abstract

Introduction

Access through your organization

Section snippets

Related work

Background study

Proposed method HRFLC

Results and discussions

Conclusion

Declaration of Competing Interest

References (15)

A Hybrid Risk Assessment Model for Cardiovascular Disease Using Cox Regression Analysis and a 2-means clustering algorithm

Comput. Biol. Med.

A multilayer perceptron-based medical decision support system for heart disease diagnosis

Expert Syst. Appl.

A Novel Approach for Coronary Artery Disease Diagnosis using Hybrid Particle Swarm Optimization based Emotional Neural Network

Biocybernetics and Biomedical Engineering

Support Vector Machines-based Heart Disease Diagnosis using Feature Subset, Wrapping Selection and Extraction Methods

Comput. Electr. Eng.

Machine learning techniques for heart disease datasets: a survey

Cited by (12)

Interpretable multidisease diagnosis and label noise detection based on a matching network and self-paced learning

Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization

Performance Comparison of Feature Selection Methods for Prediction in Medical Data

A Hybrid Feature Selection-multidimensional LSTM Framework for Deformation Prediction of Super High Arch Dams

Data-Driven Machine-Learning Methods for Diabetes Risk Prediction

Machine Learning Methods for Hypercholesterolemia Long-Term Risk Prediction

Strictly Necessary Cookies

Functional Cookies

Performance Cookies

Targeting Cookies