Olyphaunt Solutions is an organization working in Digital Transformation for the healthcare sector. We at Olyphaunt Solutions keep on experimenting on different datasets. With our capabilities in Artificial Intelligence (AI)/Machine Learning (ML), we are working on patient data and finance data to explore various trends patients have with respect to lifestyle and chronic diseases. Our analysis is helping the healthcare industry for marketing campaigns, loyalty programs and financial management.
In this article ,we have illustrated how Machine Learning can be used to understand the trend in the spread of Covid-19 cases nationally and also based on states. Such analysis will be extremely important for planners from a public health standpoint.
The aim of this project is to predict the rise of Covid-19 cases in India and analyse the top 5 most affected states in India with the use of Machine Learning Algorithms and following the CRISP-DM methodology [1].
The India Covid-19 Dataset has been taken from GitHub [2]. This dataset comprises date-wise Covid-19 positive cases for the span of 4 months in 2020. It also contains various parameters: Age, Gender, State and Diagnosed Date.
In the Data Preparation phase, we used Python programming language to observe that the dataset contained a few Null values which had to be dropped using Data Cleaning technique. The dataset also contained a few duplicate rows which had to be removed. Before plotting the curve of rising cases in India, we created a data Frame consisting of cumulative case count of Covid-19 cases in India.
We used the data of 52 days for training the Linear Regression model to obtain the forecast for 30 days. It was observed that the number of cases on day 82 crossed 67,000 cases mark. The total increase in the 30 days period was 49,542 cases. We predicted this rise of covid cases for a forecast of 30 days using Linear Regression ML algorithm. The graph also depicts the Gender divide between the Male and Female cases. The distribution of male cases to female cases is found to be 72.4% to 27.6%.
We split the dataset into 5 different data frames consisting of Covid-19 cases information from the top 5 most affected states in India: Maharashtra, Gujarat, Delhi, Rajasthan, and Kerala. With the use of Matplotlib and NumPy libraries in Python, we visualized and plotted a graph of rising covid cases. The rise in cases from day 53 to 82 shows the predicted values for each state. We have chosen Linear Regression model to predict the trend.
State-wise Rise in Covid Cases Prediction:
References
[1]Using an Industry Standard Methodology for Data Mining, https://olyphaunt.com/blog/
[2] Covid-19 Patient Dataset (India), https://github.com/PritamGuha31/COVID-19-Analysis
By,
Shaunak Bachal
ML Engineer
Olyphaunt Solutions Pvt. Ltd.
For more information, please contact us on info@olyphaunt.com