Nowadays most people are affected by several diseases. Analysing and diagnosing a disease is a difficult task faced by health care organizations. A huge amount of clinical data is present in the health care organization. Without an intelligent system, it is exceedingly difficult to extract medical data from the medical organization. Data mining techniques have a huge effect over the decades in extracting information from a data set and predicting the disease of humans. A lot of new technologies have been developed but for predicting a disease data mining with machine learning techniques play a very efficient role [1]. With the help of machine learning techniques, a prediction model can be built for numerous diseases such as breast cancer, brain tumour, ovarian cancer, lung cancer, heart diseases, etc. In current research stated the most frequent diseases faced by people in all over the world is cardiovascular diseases.
In 2017, the WHO reported the death rate is extremely high in globally because of cardiovascular diseases. When we compare to the women most of the men are affected by this cardiovascular disease. To reduce the death rate, it is important to predict disease in an earlier stage. Sometimes experts can feel exceedingly difficult to identify the CAD with a different number of signs and reasons for CAD, Such as age, BP, diabetics, contraction of heartbeat, heartburn, sweating, etc. Heart diseases are different types are there with various symptoms such as CAD – a blockage in the arteries, HF- circulation of blood is not proper, CHD – heart formation is not proper in the womb [2]. Coronary Artery Disease (CAD) falls under Classification problem in machine learning and it involves three major steps 1) Exploratory Data analysis 2) feature selection and creation/prediction using model. The redundant and duplicate data set are removed during exploratory data analysis and next the important feature which contribute for the target prediction are identified using Feature Selection. In feature selection the data are classified based on ordinal and categorical data and suitable techniques are used for same. These steps help in accuracy of model and reduce the total execution time of the model as less computation is required (Fig. 1 and Fig. 2).
Feature selection plays an important role in reducing the overfitting of data as same or redundant feature makes the prediction wrong. Even if we receive the accuracy in Training data it will not work well with testing data and this issue as “variance and bias”. A good model should have low variance and low bias and to achieve it we need to use Feature selection which helps the model perform well on both training and testing data (Fig. 3).
Features selection is of 3 types 1) Filter Method 2) Wrapper Method and 3) Embedded method which is used for selecting important feature.[3] Filter Method is computationally less and not costlier as it does not involve any Machine learning model, but other methods are efficient than filter methods but are computationally costlier than Filter method. The Model used for classification need to be validated with evaluation technique. For classification problem the below are the techniques used 1) Accuracy 2) Precision 3) Sensitivity 4) Recall and 5) Kappa. Based on the Model and type of dataset we need to evaluate the model with Evaluation technique and Choose the best model based on the score obtained.