Machine Learning Techniques for Early Classification of COVID-19 Disease in Patients
Abstract
Coronavirus (COVID-19) disease has been ravaging countries of the world for more than two years now. The disease is caused by the SARS-CoV-2 virus. It has been reported that when people are infected with the virus they may experience symptoms like mild or moderate respiratory sickness and they will be able to recover without requiring special treatment. Machine learning techniques have been identified to be very promising for establishing COVID-19 evidence in patients. Some of the past studies focused on the use of clinical images for COVID-19 disease classification. Thus, this work used identified medical symptoms in the chosen dataset for the classification purposes. The study specifically seeks to investigate the performances of two ensemble models when the dataset is pre-processed and selected promising features are used to train the models. Exploratory analysis was first of all carried out on the dataset with a view to understanding the patterns better. Then, a categorical variable in the dataset was encoded and subset features were selected with the aid of feature importance method. Thereafter, Random forest and AdaBoost algorithms were used to build coronavirus classification models from the dataset. The results showed that the two ensemble models performed better when a filter-based feature selection technique was used on the dataset compared to when all the features were used for building the COVID-19 classification models. For instance, the RF-based model record an accuracy of 0.89 and 0.96 without and with filter based feature selection respectively. Similarly, the Adaboost-based model recorded an accuracy of 0.90 and 0.97 without and with feature sub-set selection, respectively.
Copyright (c) 2022 Journal of Information Communication Technologies and Robotic Applications

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.