Feature Selection by ModifiedBoostARoota and Classification by CatBoost model on High Dimensional Heart Disease Datasets

  • Anuradha. P Avinashilingam Deemed to be University
Keywords: Heart disease, Feature Selection, CatBoost, Classification


As many human beings are losing lives due to heart disease, early detection and prevention of the same would help save lives. Various Machine Learning Algorithms are applied in classifying patients with/ without heart disease. To efficiently predict the outcome, it is desirable to select the features which highly contribute to the prediction. Feature Selection algorithms would save time and improve the prediction. In this paper, ModifiedBoostARoota (MBAR) algorithm is used for Feature Selection and classifiers CatBoost, XGBoost, Decision Tree, Extra Trees Classifier, Support Vector Classifier, Logistic Regression, K Nearest Neighbors, Naive Bayes and Random Forest were applied on UCI Arrhythmia dataset and UCI Z-Alizadeh Sani dataset. Synthetic Minority Over-sampling Technique (SMOTE) was applied to balance the dataset. A comparison on the accuracy achieved with and without SMOTE shows that after applying SMOTE, MBAR with CatBoost classifier gives better accuracy of 92.76% on Z-Alizadeh Sani dataset and 86.33% on Arrhythmia dataset.

Regular paper