So here we are with another article on Artificial Intelligence/Machine Learning (AI/ML). Olyphaunt Solutions has developed ML models to perform Predictive Analysis for Covid-19 Data from Wuhan. The CRISP-DM methodology has been used for the implementation of this project [1].
The aim of this project is to predict the death of a Covid-19 affected person by taking into consideration various parameters of a Covid affected patient and using the right Machine Learning Algorithms to achieve a good prediction accuracy.
The dataset has been taken from Kaggle and is also available on GitHub [2]. This dataset comprises patient details consisting of patient details: age, gender, and symptoms (cold, cough, fever, asthma, difficulty in breathing, throat pain and pneumonia). The prediction of death of the patient was done by considering these symptoms as parameters.
Before applying any Machine Learning (ML) algorithms, the dataset had to be checked for any missing values and duplicate data. This was done by Data Pre-processing techniques: Data Cleaning and Data Deduplication. To make the data into machine-readable form, Label Encoding method was used to convert labels into numeric format and a Data Normalization technique: Feature Scaling was applied to scale the values between 0 and 1. A Correlation Matrix was used to identify the relationship between the variables in the dataset.
To determine the right Machine Learning (ML) model, we had to select between Supervised and Unsupervised Learning. We chose Supervised Learning as it uses labelled datasets to train algorithms that classify data or predict outcomes correctly. Supervised Learning can be further divided into two types: Regression and Classification. Regression algorithms are used with continuous data and since we had to predict/classify the discrete values, we chose classification technique. To predict the death of a patient, we implemented 5 classification Machine Learning models: Logistic Regression, Decision Tree, Random Forest, Support Vector Machines (SVM) and AdaBoost. Dataset had to be split into training and testing data in the ratio 80% to 20% respectively to find out the prediction accuracies of the models. We found out that Random Forest Algorithm was the best performing model with an accuracy of 96.31% followed by AdaBoost Algorithm with 95.85% accuracy. Logistic Regression, SVM and Decision Tree gave an accuracy of 71.11%, 75.55% and 77.77% respectively. These accuracies represent how close the models’ outcomes are to the actual values. Therefore, 96.31% of the time, the Random Forest ML model can predict the correct outcome, i.e., the outcome that matches reality. So, in future, it can be used as an input to plan the treatment for the patient.
Model vs Accuracy Table:
Actual vs Predicted Comparison for Different Models:
This machine learning project can be used as an input to improve patient care and decision-making in the healthcare industry. Based on the symptoms of a patient, the model can provide useful information on what kind of treatment needs to be provided.
How can Olyphaunt Solutions help:
At Olyphaunt Solutions, we specialize in designing and developing end-to-end solutions based on AI/ML and IoT technologies. Our expertise in Artificial Intelligence, Machine Learning, Sensors, and Internet of Things (IoT), is available for delivering scalable, robust solutions for industrial needs. For more information, please contact us.
References
[1] Using an Industry Standard Methodology for Data Mining, https://olyphaunt.com/blog/
[2] Covid-19 Patient Dataset (Wuhan),https://github.com/Atharva-Peshkar/Covid-19-Patient-Health-Analytics
By,
Shaunak Bachal
ML Engineer
Olyphaunt Solutions Pvt. Ltd.
For more information, please contact us on info@olyphaunt.com