0% found this document useful (0 votes)
14 views7 pages

Regression Analys

Uploaded by

xayzh123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views7 pages

Regression Analys

Uploaded by

xayzh123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)

IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

Regression Analysis of COVID-19 using Machine


Learning Algorithms
Ekta Gambhir Ritika Jain Alankrit Gupta
B. Tech Student - CSE B. Tech Student - EEE B. Tech Student - CSE
Dr. Akhilesh Das Gupta Institute of Dr. Akhilesh Das Gupta Institute of Dr. Akhilesh Das Gupta Institute of
Technology and Management Technology and Management Technology and Management
New Delhi, India New Delhi, India New Delhi, India
[email protected] [email protected] [email protected]
New Delhi, India
Uma Tomer [email protected]
Assistant Professor - CSE
Dr. Akhilesh Das Gupta Institute of
Technology and Management

Abstract— The outbreak of the Novel Coronavirus or the have been recoveries around the world, but the pandemic is
COVID-19 in various parts of the world has affected the world still not under control.
as a whole and caused millions of deaths. This remains an Since this pandemic has affected the whole world not only
ominous warning to public health and will be marked as one of in terms of health and hygiene but also in terms of the global
the greatest pandemics in world history. This paper aims to economy. Apart fro m the adverse effects of COVID -19,
provide a better understanding of how various Machine
Learning models can be implemented in real-world situations. there have been certain constructive influences around the
Apart from the analysis done on the world figures, this paper world. As the world was facing loses, our nature gained
also analyzes the current trend or pattern of Covid-19 something from th is pandemic, the harmful part iculate
transmission in India. With the help of datasets from the matter was eliminated fro m the environment and most
Ministry of Health and Family Welfare of India, this study importantly the largest ever ozone hole detected was closed
puts forward various trends and patterns experienced in during this pandemic. So it becomes really important to
different parts of the world. The data to be studied has been understand the features and characteristics of this disease
obtained for 154 days i.e. from January 22, 2020, till June 24, and predict/estimate the further spread of this disease around
2020. For future references, the data can be further analyzed, the world and how it is going to impact the co ming
and more results can be obtained.
generations and the lives of the people when things become
Keywords— CO VID-19, Machine Learning, Data Analysis, normal.
Trend Analysis The timeline of the events of COVID-19 across different
nations [2] is shown in Fig 1. and the percentage of
I. INT RODUCT ION confirmed cases per country is shown by Fig 2.
According to the World Health Organizat ion (WHO),
viral and infectious diseases continue to appear and pose a
serious threat to public health and well-being. Coronavirus
is a broad family of viruses which causes ailments ranging
fro m co mmon cold and flu to severe respiratory issues.
According to NCBI, “In the last 20 years, there have been
several viral ep idemics that have been reported such as the
Severe Acute Respiratory Syndrome Co ronavirus or better
known as SARS-Co V which was declared a pandemic by
WHO in 2002 - 2004 and H1N1 influenza in 2009. With
most recently, Middle East Respiratory Syndro me
Coronavirus better known as M ERS- Co V which hit its first
outbreak in Saudi Arabia in 2012” [1].
In the chronology of modern times, cases of unrecognized
low respiratory infections were first detected during the mid
December 2019 in Wuhan, the largest metropolitan city in
Hubei province of Ch ina. This strange new pneumonia was
named “COVID-19” by WHO. WHO declared this surge a
Public Health Emergency of International Concern (PHEIC)
on January 30, 2020 as it had affected almost 20 countries of
the world [2]. There are no specific treat ments of this virus Fig. 1. T imeline of the events of COVID-19 across different nations
so far, but one can reduce the spread of infection by
maintaining personal hygiene and social distancing. There

© IEEE 2020. This article is free to access and download, along with rights for full text and data mining, re-use and analysis 65
Authorized licensed use limited to: IEEE Xplore. Downloaded on August 11,2023 at 04:48:12 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

Dimensionality reduction helps in evaluating and drawing


conclusions easily fro m the dataset. To draw better
conclusions fro m this dataset, the irrelevant parameters such
as the Longitude and Latitude were removed, and the dates
were converted to a date-time object.

II. M ODEL IMPLEMENT AT ION AND A NALYSIS


A. Implementation of Support Vector Machine Algorithm in
Python
Support Vector Machine A lgorith ms are supervised machine
learning models that are associated with data classification
and regression analysis. It constructs a hyperplane or set of
hyperplanes in an N-dimensional space for classificat ion or
outlier detection. The best params obtained for SVM in
Python is shown by Fig 3 and Fig 4. shows the predicted
values using the SVM Algorith m. Fig 5. Shows the
graphical representation of predictions using the Support
Vector Machine Algorithm.

Fig. 2. Percentage of confirmed cases per country.

A. Data Acquisition and Description


The dataset available fro m the data repository for the
“2019 Novel Coronavirus Visual Dashboard operated by the
Johns Hopkins University Center for Systems Science and
Engineering (JHU CSSE), also supported by ESRI Living
Atlas Team and the Johns Hopkins University Applied
Physics Lab (JHU APL)” is parameterized dataset having
relevant parameters such as Province/State, Country/Region,
Latitude, Longitude and dates. Separate datasets have been
used for Confirmed, Death, and Recovered cases along with
the number o f cases on each day. The total dataset used in
the study is obtained for 154 days i.e. fro m January 22,
2020, t ill June 24, 2020. The data fro m these datasets were
merged to obtain the parameterized dataset of the world
from January 22, 2020, till June 24, 2020.
Fig. 3. Shows the best params obtained for SVM in Python
The reason behind choosing regression analysis for the
current problem statement is the type of dataset. The dataset
is a continuous dataset and regression analysis is best suited
when a continuous dependent variable is wanted to predict
fro m several independent variables. The coefficients of the
dependent and independent variables in the regression
equation (1) determine the relationship between the
dependent and independent variables.

ȯ ൌ ߠߧ ൅ ߠଵ ܺ ൅ߠଶ ܺ ଶ ൅ ǥ൅ ߠ௠ ܺ ௠ ൅ ‫( ݈ܽݑ݀݅ݏ݁ݎ‬1) Fig. 4. Shows the predicted values using the SVM Algorithm.
݁‫ݎ݋ݎݎ‬
Here:
ߠߧ is the bias
ߠଵ ǡ ߠଶ ǡ ǥǡ ߠ௠ are the weights in the equation of
polynomial regression and m is the degree of a polynomial

B. Feature Selection
This step involves feature extract ion and selection to
obtain the best results from our model. Hav ing good and
best features allows us to illustrate the underlying structure
of the data. Feature Engineering affects the performance o f
the model significantly. It may involve splitting some
feature or aggregating some features to produce new
features or collecting data from external sources. Fig. 5. Predictions using the Support Vector Machine Algorithm.

66
Authorized licensed use limited to: IEEE Xplore. Downloaded on August 11,2023 at 04:48:12 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

B. Implementation of the Polynomial Regression Algorithm


in Python
Polynomial Regression can be expressed as a special case of
Linear Regression. In Linear Regression it works on
continuous data is known and the two variables (target
variable and independent variable) are correlated. What if
have known that variables are correlated but the relationship
does not look linear, so polynomial regression to fit a
polynomial equation can be used to our dataset. Polynomial
Regression is a supervised Machine learning Algorith m that
is trained based on prior data and then tested on another
dataset to validate its accuracy.
The train and test data have been transformed fo r
polynomial regression in Python which is shown by Fig 6.
Fig 7. shows the predicted values from August 7, 2020, to
August 28, 2020, which are g raphically represented as Fig. 8. Shows the graphical representation of predictions using Polynomial
shown in Fig 8. Thus, the polynomial Regression Algorith m Regression
shows an accuracy of 93%. III. DAILY ESCALAT ION IN W ORLD

To see the daily changes in the cases across World, the trend
was plotted for Confirmed, Death and Recovered cases.
Fig. 6. Shows the transformation of train and test data for polynomial Fig 9. shows the daily increase in the number of
regression in Python Confirmed cases while Fig 10. shows the daily increase in
the number of Confirmed Deaths and Fig 11. shows the
daily increase in the number of Confirmed Recoveries
across the world.

Fig. 9. Shows the daily increase in the number of Confirmed cases

Fig. 7. Shows the predicted values from August 7, 2020 to August 28, 2020

Fig. 10. Shows the daily increase in the number of Confirmed Deaths.

67
Authorized licensed use limited to: IEEE Xplore. Downloaded on August 11,2023 at 04:48:12 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

and taking into account the population of the country it


becomes essential to utilize the resources efficiently and
effectively in the best possible way. Despite not having the
best health care facilities like most of the developed nations
like the USA, Italy, Germany and many other countries,
India has been admin istering this pandemic in the best way
possible.
The further investigation involved the daily increase in the
number of confirmed, death and recovered cases. The
graphs have been plotted for the same. Fig 13. and Fig 14.
shows the daily increase in the number of Confirmed and
Recovered cases in India, respectively.

Taking into account the different states of India, charts were


plotted for these States/UTs in terms of Total Confirmed
Fig. 11. Shows the daily increase in the number of Confirmed Recoveries. Cases, Active Cases and Total Closed cases. Fig 15. Shows
the plot for the nu mber of Total Confirmed Cases. This
graph clearly shows that Maharashtra has the maximu m
IV. CASE STUDY: CURRENT TREND IN INDIA
number of Cases amongst the top 10 States followed by
A. Data Acquisition, Description and Feature Selection Delh i and Tamil Nadu. Fig 16. Shows the plot for the
This segment describes the collection, processing and number of Active Cases and Fig 17. Shows the plot for the
assessing the data/statistics of India for evaluating the number of Closed Cases.
current trend of COVID-19 infections in India. Real-time
data has been collected fro m the website of the M inistry of
Health and
Family Welfare of India, https://www.mohfw.gov.in/. The
informat ion has been scraped from the website using the
Beautifu lSoup module in Python. The link to the scrapped
data and analysis of the current trend in India:
https://github.com/Ritikajain18/ COVID-19-
INDIA/blob/master/covid_scrapped_data.ipynb. The first
case of COVID-19 in India was detected on January 30,
2020, in Kerala when a student from Wuhan came back to
India. The data fro m January 30, 2020, to June 24, 2020, has
been used to evaluate the statistics of India on the ongoing
pandemic. The data consists of all the States and Union
Territories (State/UTs) of India with the number of Active,
Recovered, Deaths and a Total number of Confirmed cases . Fig. 13. Daily increase in confirmed cases
To tally and easily work upon the acquired data, the data
is sorted as shown in Fig 12. so that the top 10 States/UTs
with maximu m cases are obtained.

Fig. 12. Shows the sorted data of top 10 States/UTs with maximum cases
Fig. 14. Daily increase in recovered cases
B. Reason for studying State/UT- wise Data
One of the most astounding aspects of COVID-19 in the
India is the second-most populous country with a total area
world is that despite being the second-most populous
of 3.287 million km² so analyzing the spread of infection
country in the world, the mortality rate of India is the lo west
state-wise will help the government to utilize the availab le
with more than 4 lakh cases . Fig 18. shows the plot for
resources efficiently [3]. As far as health care facilities are
mortality rate. The graph distinctly shows that Gujarat has
considered, India doesn’t have the best facilit ies in the world
the highest mortality rate followed by Maharashtra.

68
Authorized licensed use limited to: IEEE Xplore. Downloaded on August 11,2023 at 04:48:12 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

Fig. 15. Plot for the number of T otal Confirmed Cases.

Fig. 16. Plot for the number of Active Cases.

Fig. 17. Plot for the number of Closed Cases

69
Authorized licensed use limited to: IEEE Xplore. Downloaded on August 11,2023 at 04:48:12 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

Fig. 18. Shows the plot for mortality rate.

rate of India the lowest in the world despite being the


V. CONCLUSION AND FUT URE SCOPE second most
This research paper successfully analyzed the current
trend of the transmission of Covid-19 in the world. populous country in the world?”. Further, more attributes
Anticipating the further spread of the Covid-19 or better can be included in the study in order to add more accuracy
known as Novel Coronavirus will help in taking necessary during the process. Further investigation may involve the
actions to control the spread. The paper also presented a analysis on India dataset by predicting the number of cases
comprehensive study of the spread of the virus outbreak in future and how the mortality rate varies with the rise in
situation in India wh ich will further help in taking necessary the number of cases. Hope this article contributes to the
steps to manage the huge population of India. For the same, world ’s response to this epidemic and puts forward some
two Machine Learning models were used but in future Deep references for further research in future.
Learn ing models or hybrid t wo or models can be used to
forecast the further spread of the virus. Real-time data has REFERENCES
been used for this analysis and not the conventional data [1] Features, Evaluation and Treatment Coronavirus (COVID-19) .
because the data validity of real time data highly depends on Available: https://www.ncbi.nlm.nih.gov/books/NBK554776/, May
18 (2020).
the time and its response time requirements come fro m the
[2] N. S Punn, S. K. Sonbhadra, S. Agarwal, “ COVID-19 Epidemic
external world. A lthough the real time data can be relaxed in Analysis using Machine Learning and Deep Learning Algorithms”,
a few cases unlike the conventional data which needs to medRxiv, Available: doi:
satisfy every case, real t ime data is appro ximately correct https://doi.org/10.1101/2020.04.08.20057679, June 1 (2020).
with performance metrics calculated as a number of [3] P. Ghosh, R. Ghosh, B. Chakraborty, “COVID-19 in India: State-wise
Analysis and Prediction”, medRxiv, Available: doi:
transactions missing their deadlines per unit time. Through https://doi.org/10.1101/2020.04.24.20077792, May 19 (2020).
our analysis can be concluded that the Polynomial [4] S. F. Ardabili, A. Mosavi, P. Ghamisi, F. Ferdinand, A. R. Varkonyi-
Regression Algorith m as compared to the Support Vector Koczy, U.Reuter, T. Rabczuk, P. M. Atkinson, “COVID-19 Outbreak
Machine Algorithm, shows an accuracy of approximately Prediction with Machine Learning”, Available at SSRN:
93% by predicting the rise in cases for the next 60 days i.e. https://ssrn.com/abstract=3580188 or
http://dx.doi.org/10.2139/ssrn.3580188, April 19 (2020).
for the months of July and August. The case study in this
[5] F. Petropoulos, S. Makridakis, “ Forecasting the novel coronavirus
paper puts forward and answers the question that “Why has COVID-19”, Available:
the data analysis for India been done State/UTs wise?”. https://doi.org/10.1371/journal.pone.0231236, Published: March 31
Also, this analysis raises the question “Why is the mortality (2020).

70
Authorized licensed use limited to: IEEE Xplore. Downloaded on August 11,2023 at 04:48:12 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

[6] Arti. M. K, K. Bhatnagar, “Modeling and Predictions for COVID 19 Intensive Care Med, vol.6 Issue 6, Available:
Spread in India”, Available: DOI: 10.13140/RG.2.2.11427.81444, https://doi.org/10.1007/s00134-020-05979-7, pp. 837–840, June
Published: April (2020). (2020).
[7] H. Shekhar, “ Prediction of Spreads of COVID-19 in India from [12] W.C.Roda, M. B.Varughese, D. Han, M. Y. Lia, “Why is it difficult
Current T rend”, medRxiv, Available: doi: to accurately predict the COVID-19 epidemic?”, Infectious Disease
https://doi.org/10.1101/2020.05.01.20087460, May 06, (2020). Modelling, vol. 5, Available:
[8] Yan, L., Zhang, H., Goncalves, J. et al. “An interpretable mortality https://doi.org/10.1016/j.idm.2020.03.001, pp. 271-281, (2020).
prediction model for COVID-19 patients”. Nature Machine [13] W. Naudé, “ Artificial intelligence vs COVID-19: limitations,
Intelligence 2, Available: https://doi.org/10.1038/s42256-020-0180-7, constraints and pitfalls”, Available: doi: 10.1007/s00146-020-00978-
pp. 283–288, Published: 14 May (2020). 0, pp. 1–5, Apr 28 (2020).
[9] L. Li, Z. Yang, Z. Dang, C. Meng, J. Huang, H. Meng, D. Wang, G. [14] R. Gupta, S.K. Pal, G. Pandey, “A Comprehensive Analysis of
Chen, J. Zhang, H. Peng, Y. Shao, “Propagation analysis and COVID-19 Outbreak Situation in India”, Available: DOI:
prediction of the COVID-19”, Infectious Disease Modelling vol. 5, 10.1101/2020.04.08.20058347, Published: April (2020).
Available: https://doi.org/10.1016/j.idm.2020.03.002, pp. 282-292, [15] Bindhu, V. "Biomedical Image Analysis Using Semantic
(2020). Segmentation." Journal of Innovative Image Processing (JIIP) 1, no.
[10] S. Zhao, H. Chen, “Modeling the epidemic dynamics and control of 02 (2019): 91-101.
COVID-19 outbreak in China”. Quantitative Biology, vol. 8, Issue 1, [16] Chandy, A. (2019),”A Review On Iot Based Medical Imaging
Available: https://doi.org/10.1007/s40484-020-0199-0, pp. 11–19 Technology For Healthcare Applications”, Journal of Innovative
March (2020). Image Processing (JIIP), 1(01), 51-60.
[11] J. Xie, Z. Tong, X. Guan. et al. “Critical care crisis and some
recommendations during the COVID-19 epidemic in China”.

71
Authorized licensed use limited to: IEEE Xplore. Downloaded on August 11,2023 at 04:48:12 UTC from IEEE Xplore. Restrictions apply.

You might also like