Hybrid feature selection technique for prediction of cardiovascular diseases

https://doi.org/10.1016/j.matpr.2021.03.225Get rights and content
Referred to by
Materials Today: Proceedings, Volume 81, Part 2, 2023, Pages 90
View PDF

Abstract

Diagnosing a disease consumes a part of time, needs high technical methods but nowadays the smart technologies have been grown rapidly in the field of healthcare industries and also it improves the routine life of the patients, reduces the amount of work, treatment cost in the health care organization. Diseases prediction is one of the major challenges faced by society nowadays. The recent survey also stated that the death rate is remarkably high in CAD because most of the people are affected by cardiovascular diseases. Prediction and diagnosis of cardiovascular diseases is very essential nowadays to reduce the death rate and diagnosing it preliminary stage itself. In earlier studies, they worked with machine learning techniques to predict the diseases, but they are not given proper attention to identifying the feature with the help of proper feature selection methods. This paper proposed a new-found feature selection technique HRFLC (RANDOM FOREST + ADABOOST +  PEARSON COEFFICIENT). This method helps to predict the diseases in a very efficient manner, and it improves the accuracy level in prediction.

Introduction

Nowadays most people are affected by several diseases. Analysing and diagnosing a disease is a difficult task faced by health care organizations. A huge amount of clinical data is present in the health care organization. Without an intelligent system, it is exceedingly difficult to extract medical data from the medical organization. Data mining techniques have a huge effect over the decades in extracting information from a data set and predicting the disease of humans. A lot of new technologies have been developed but for predicting a disease data mining with machine learning techniques play a very efficient role [1]. With the help of machine learning techniques, a prediction model can be built for numerous diseases such as breast cancer, brain tumour, ovarian cancer, lung cancer, heart diseases, etc. In current research stated the most frequent diseases faced by people in all over the world is cardiovascular diseases.
In 2017, the WHO reported the death rate is extremely high in globally because of cardiovascular diseases. When we compare to the women most of the men are affected by this cardiovascular disease. To reduce the death rate, it is important to predict disease in an earlier stage. Sometimes experts can feel exceedingly difficult to identify the CAD with a different number of signs and reasons for CAD, Such as age, BP, diabetics, contraction of heartbeat, heartburn, sweating, etc. Heart diseases are different types are there with various symptoms such as CAD – a blockage in the arteries, HF- circulation of blood is not proper, CHD – heart formation is not proper in the womb [2]. Coronary Artery Disease (CAD) falls under Classification problem in machine learning and it involves three major steps 1) Exploratory Data analysis 2) feature selection and creation/prediction using model. The redundant and duplicate data set are removed during exploratory data analysis and next the important feature which contribute for the target prediction are identified using Feature Selection. In feature selection the data are classified based on ordinal and categorical data and suitable techniques are used for same. These steps help in accuracy of model and reduce the total execution time of the model as less computation is required (Fig. 1 and Fig. 2).
Feature selection plays an important role in reducing the overfitting of data as same or redundant feature makes the prediction wrong. Even if we receive the accuracy in Training data it will not work well with testing data and this issue as “variance and bias”. A good model should have low variance and low bias and to achieve it we need to use Feature selection which helps the model perform well on both training and testing data (Fig. 3).
Features selection is of 3 types 1) Filter Method 2) Wrapper Method and 3) Embedded method which is used for selecting important feature.[3] Filter Method is computationally less and not costlier as it does not involve any Machine learning model, but other methods are efficient than filter methods but are computationally costlier than Filter method. The Model used for classification need to be validated with evaluation technique. For classification problem the below are the techniques used 1) Accuracy 2) Precision 3) Sensitivity 4) Recall and 5) Kappa. Based on the Model and type of dataset we need to evaluate the model with Evaluation technique and Choose the best model based on the score obtained.

Access through your organization

Check access to the full text by signing in through your organization.

Access through your organization

Section snippets

Related work

An efficient framework was designed to diagnose a disease and to identify the subset from the large dataset was proposed by Qinglin et al. [4]. The proposed method was divided into two sections i) in method one they use GA(Genetic algorithm) to generate an initial position and GWO (GREY WOLF OPTIMIZATION) for updating the current position ii) in method two to get better classification they use KELM III)to improve the accuracy of this technique they hybrid this method (GWO + KELM). This

Background study

Various methods are employed for prediction of disease either by traditional methods or by machine learning algorithm. But the methods could achieve low accuracy, Sensitivity and Specificity. A good model should have balance between the Sensitivity and Specificity. Existing models such as Logistic regression, Recursive Feature selection, genetic algorithm and deep learning have performed to certain extent and achieved certain level of accuracy. There is much research published based on the

Proposed method HRFLC

In this paper we use new technique for Feature selection which uses combination of Random Forest, Ad boost and linear correlation based on the comparison with Target variable. Based on this feature selection technique the important features are identified and applied to different machine learning technique. This method is implemented in phyton environment to identify the significant feature from the dataset. It provides a graphical representation of the dataset, work environment and predictive

Results and discussions

Comparisons are done with feature selection techniques described in the paper and without feature selection techniques. Based on the HRFLC algorithm 11 features are selected and evaluation parameters are compared. The result show the accuracy is improved using the model proposed.
The 11 features selected by the models are
'age', 'sex', 'chest', 'resting_blood_pressure', 'serum_cholestoral','fasting_blood_sugar', 'maximum_heart_rate_achieved', 'oldpeak', 'slope', 'number_of_major_vessels', 'thal'.The below

Conclusion

Identifying and diagnosing a disease in earlier stage can save humans life. Most of the people are affected by cardiovascular diseases; prevention of the diseases is the major challenge for the health care industry. Prediction and diagnosis of cardiovascular diseases is very essential nowadays to reduce the death rate and diagnosing it preliminary stage itself. A machine learning algorithm plays a very important role in predicting the diseases and its helps to process the raw data into useful

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (15)

There are more references available in the full text version of this article.

Cited by (12)

View all citing articles on Scopus
View full text