P G T COVID-19 P M L C C: Redicting The Rowth and Rend of Andemic Using Achine Earning and Loud Omputing

medRxiv preprint doi: https://doi.org/10.1101/2020.05.06.20091900; this version posted May 11, 2020.
The copyright holder for this preprint

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
P REDICTING THE G ROWTH AND T REND OF COVID-19

PANDEMIC USING M ACHINE L EARNING AND C LOUD
C OMPUTING
Shreshth Tuli Shikhar Tuli

Department of Computer Science and Engineering Department of Electrical Engineering
Indian Institute of Technology - Delhi Indian Institute of Technology - Delhi
India India
[email protected] [email protected]
Rakesh Tuli Sukhpal Singh Gill

Department of Biotechnology Engineering School of EECS
UIET, Panjab University, Chandigarh Queen Mary University of London
India United Kingdom
[email protected] [email protected]
May 4, 2020
A BSTRACT
The outbreak of COVID-19 Coronavirus, namely SARS-CoV-2, has created a calamitous situation
throughout the world. The cumulative incidence of COVID-19 is rapidly increasing day by day.
Machine Learning (ML) and Cloud Computing can be deployed very effectively to track the disease,
predict growth of the epidemic and design strategies and policy to manage its spread. This study
applies an improved mathematical model to analyse and predict the growth of the epidemic. An
ML-based improved model has been applied to predict the potential threat of COVID-19 in countries
worldwide. We show that using iterative weighting for fitting Generalized Inverse Weibull distribution,
a better fit can be obtained to develop a prediction framework. This can be deployed on a cloud
computing platform for more accurate and real-time prediction of the growth behavior of the epidemic.
A data driven approach with higher accuracy as here can be very useful for a proactive response from
the government and citizens. Finally, we propose a set of research opportunities and setup grounds
for further practical applications. Predicted curves for some of the most affected countries can be
seen at https://collaboration.coraltele.com/covid/.
Keywords COVID-19; Machine Learning; SARS-CoV-2; Coronavirus; Prediction; Cloud Computing
1 Introduction
The novel Coronavirus disease (COVID-19) was first reported on 31 December 2019 in the Wuhan, Hubei Province,
China. It started spreading rapidly across the world [1]. The cumulative incidence of the causitive virus (SARS-CoV-2)
is rapidly increasing and has affected 196 countries and territories with USA, Spain, Italy, U.K. and France being the
most affected [2]. World Health Organization (WHO) has declared the coronavirus outbreak a pandemic, while the virus
continues to spread [3]. As on 4 May 2020, a total of 3,581,884 confirmed positive cases have been reported leading
to 248,558 deaths [2]. The major difference between the pandemic caused by CoV-2 and related viruses, like SARS
and MERS is the ability of CoV-2 to spread rapidly through human contact and leave nearly 20% infected subjects as
0
Abbreviations: ML, Machine Learning; SARS-CoV-2; Severe Acute Respiratory Syndrome Coronavirus 2, COVID-19,
Coronavirus disease
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.
medRxiv preprint doi: https://doi.org/10.1101/2020.05.06.20091900; this version posted May 11, 2020. The copyright holder for this preprint
P REDICTING THE IMPACT AND END OF COVID-19 PANDEMIC - M AY 4, 2020
symptom-less carriers [4]. Moreover, various studies reported that the disease caused by CoV-2 is more dangerous for
people with weak immune system. The elderly people and patients with life threatening diseases like cancer, diabetes,
neurological conditions, coronary heart disease and HIV/AIDS are more vulnerable to severe effects of COVID-19 [5].
In the absence of any curative drug, the only solution is to slow down the spread by exercising “social distancing"
to block the chain of spread of the virus. This behavior of CoV-2 requires developing robust mathematical basis for
tracking its spread and automation of the tracking tools for on line dynamic decision making.
There is a need for innovative solutions to develop, manage and analyse big data on the growing network of infected
subjects, patient details, their community movements, and integrate with clinical trials and, pharmaceutical, genomic
and public health data [6]. Multiple sources of data including, text messages, online communications, social media
and web articles can be very helpful in analyzing the growth of infection with community behaviour . Wrapping
this data with ML and Artificial Intelligence (AI), researchers can forecast where and when, the disease is likely to
spread, and notify those regions to match the required arrangements. Travel history of infected subjects can be tracked
automatically, to study epidemiological correlations with the spread of the disease. Some community transmission
based effects have been studied in other works (https://www.cdc.gov/mmwr/volumes/69/wr/mm6915e1.htm).
Infrastructure for the storage and analytics of such huge data for further processing needs to be developed in an efficient
and cost-effective manner. This needs to be organized through utilization of cloud computing and AI solutions [7].
Alibaba developed cloud and AI solutions to help China, fight against coronavirus, predict the peak, size and duration
of the outbreak, which is claimed to have been implemented with 98% accuracy in real world tests in various regions
of China [8]. FDifferent types of pneumonia can be resolved using ML-based CT Image Analytics Solution, which
can be helpful to monitor the patients with COVID-19 [9]. Details can be seen at https://spectrum.ieee.org/
the-human-os/biomedical/imaging/hospitals-deploy-ai-tools-detect-covid19-chest-scans. The
development of vaccine for COVID-19 can also be accelerated by analysing the genome sequences and molecular
docking, deploying various ML and AI techniques [10].
Motivation and Our Contributions: ML [11] can be utilized to handle large data and intelligently predict the
spread of the disease. Cloud computing [12] can be used to rapidly enhance the prediction process using high-speed
computations [7]. Novel energy-efficient edge systems can be used to procure data, in order to bring down power
consumption. In this paper, we present a prediction model deployed using FogBus framework [13] for accurate
prediction of the number of COVID-19 cases, the rise and the fall of the number of cases in near future and the date
when various countries may expect the pandemic to end. We also provide a detailed comparison with a baseline model
and show how catastrophic the effects can be if poorly fitting models are used. We present a prediction scheme based
on the ML model, which can be used in remote cloud nodes for real-time prediction allowing governments and citizens
to respond proactively. Finally, we summarize this work and present various research directions.
Article structure: The rest of the paper is organized as follows: Section 2 presents the prediction model and
performance comparison. Section 3 concludes the work and describes the future research opportunities. Section
4 provides details of open repositories for the dataset, code and results.
2 Prediction Model and Performance Comparison
Machine Learning (ML) and Data Science community are striving hard to improve the forecasts of epidemiological
models and analyze the information flowing over Twitter for the development of management strategies, and the
assessment of impact of policies to curb its spread. Various datasets in this regard have been openly released to the
public. Yet, there is a need to capture, develop and analyse more data as the COVID-19 grows worldwide [14, 15].
The novel coronavirus is leaving a deep socio-economic impact globally. Due to the ease of virus transmission, primarily
through droplets of saliva or discharge from the nose when an infected person coughs or sneezes, countries which are
densely populated need to be on a higher alert [16]. To gain more insight on how COVID-19 is impacting the world
population and to predict the number of COVID-19 cases and dates when the pandemic may be expected to end in
various countries, we propose a Machine Learning model that can be run continuously on Cloud Data Centers (CDCs)
for accurate spread prediction and proactive development of strategic response by the government and citizens.
Dataset: The dataset used in this case study is the Our World in Data by Hannah Ritchie1 . The dataset is updated
daily from the World Health Organization (WHO) situation reports2 . More details about the dataset are available at:
https://ourworldindata.org/coronavirus-source-data.
1
Our World In Data: COVID-19 Dataset; source: https://github.com/owid/covid-19-data/tree/master/public/
data/
2
Situation Reports: WHO; source: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/
situation-reports
2
Cloud Enterprise
AI Analytics Prediction COVID

Engine Databases
End users
Gateway Private Government

Patients
Devices Hospitals Centers
Figure 1: Proposed Cloud based AI framework for COVID-19 related analytics.
Cloud framework: The ML models are built to make a good advanced prediction of the number of new cases and
the dates when the pandemic might end. To provide fail-safe computation and quick data analysis, we propose a
framework to deploy these models on cloud datacenters, as shown in Figure 1.In a cloud based environment, the
government hospitals and private health-centers continuously send their positive patient count. Population density,
average and median age, weather conditions, health facilities etc. are also to be integrated for enhancing the accuracy
of the predictions. For this case study, we used three instances of single core Azure B1s virtual machines with 1-GiB
RAM, SSD Storage and 64-bit Microsoft Windows Server 20161 . We used the HealthFog [11] framework leveraging
the FogBus [13] for deploying multiple analysis tasks in an ensemble learning fashion to predict various metrics, like
the number of anticipated facilities to manage patients and the hospitals. We analyzed that the cost of tracking patients
on a daily basis, amortized CPU consumption and Cloud execution is 37% and only 1.2 USD per day. As the dataset
size increases, computationally more powerful resources would be needed.
ML model: Many recent works have suggested that the COVID-19 spread follows exponential distribution [17, 18, 19].
As per empirical evaluations and previous datasets on SARS-CoV-1 virus pandemic, many sources have shown that data
corresponding to new cases with time has large number of outliers and may or may not follow a standard distribution like
Gaussian or Exponential [20, 21, 22, 23]. In recent study by Data-Driven Innovation Laboratory, Singapore University
of Technology and Design (SUTD)3 , the regression curves were drawn using the Susceptible-Infected-Recovered
model [24] and Gaussian distribution was deployed to estimate the number of cases with time. However, in the
previously reported studies on the earlier version of the virus, namely SARA-CoV-1, the data was reported to follow
Generalized Inverse Weibull (GIW) Distribution [25] better than Gaussian as shown in Figure 2 (details of Robust
Weibull fitting follow in the next section). Detailed comparison for SARS-CoV-2 has been described in the next section.
This fits the following function to the data:
α
f (x) = k · γ · β · αβ · x−1−β · exp − γ( )β . (1)
x
Here, f (x) denotes the number of cases with x, where x > 0 is the time in number of days from the first case, and
α, β, γ > 0, ∈ R are parameters of the model. Now, we can find the appropriate values of the parameters α, β and γ to
minimize the error between the predicted cases (y = f (x)) and the actual cases (ŷ). This can be done using the popular
Machine Learning technique of Levenberg-Marquardt (LM) for curve fitting [26]. However, as various sources have
suggested, in initial stages of COVID-19 the data has many outliers and noise. This makes it hard to accurately predict
the number of cases. Thus, we propose an iterative weighting strategy and call our fitting technique "Robust Fitting". A
diagrammatic representation of the iterative weighting process is shown in Figure 3.
The main idea is as follows. We maintain weights for all data points (i) in every iteration (n, starting from 0) as win .
First, we fit a curve using the LM technique with weights of all data points as 1, thus wi0 = 1 ∀ i. Second, we find the
weight corresponding to every point for the next iteration (win+1 )) as:
1
Azure Cloud VMs: https://azure.microsoft.com/en-au/pricing/calculator/
3
When Will COVID-19 End, DDI Lab, SUTD: https://ddi.sutd.edu.sg/when-will-covid-19-end
3
Figure 2: Fit curves for SARS-CoV-1 pandemic for Hong Kong (SAR), China. Data source: WHO epidemic curves
(https://www.who.int/csr/sars/epicurve/epiindex/en/index4.html)
Figure 3: Iterative weighting technique for robust curve fitting.
dn −tanh(dn

i )
exp 1 − maxii dn −tanh(d n)
win+1 =P i i
. (2)
dn n
i −tanh(di )
i exp 1 − maxi dn −tanh(dn )
i i
Simply, in the above equation, we first take tanhshrink function defined as tanhshrink(x) = x − tanh(x) for the
distances of all points along y axis from the curve (di ). This is to have a higher value for points far from the curve and
near 0 value for closer points. This, is then standardized by dividing with max value over all points and subtracted from
1 to get a weight corresponding to each point. This weight is then standardized using sof tmax function so that sum of
all weights is 1. The curve is fit again using LM method, now with the new weights win+1 . The algorithm converges
when the sum total deviation of all weights becomes lower than a threshold value.
Distribution Model Selection: To find the best fitting distribution model for the data corresponding to COVID-19, we
studied the data on daily new confirmed COVID cases. Five sets of global data on daily new COVID-19 cases were
used to fit parameters of different types of distributions. Finally, we identified the best performing 5 distributions. The
results are shown in Table 1. We observe that using the iteratively weighted approach, the Inverse Weibull function fits
the best to the COVID-19 dataset, as compared to the iterative versions of Gaussian, Beta (4-parameter), Fisher-Tippet
(Extreme Value distribution), and Log Normal functions. When applied to the same dataset, Iterative Weibull showed an
average MAPE of 12% lower than non-iteratively weighted Weibull. A step-by-step algorithm for iteratively weighted
curve fitting using the GIW distribution (called "Robust Weibull") is given in Algorithm 1.
Analysis and Interpretation: To compare the proposed "Robust Weibull fitting" model, we use the baseline proposed by
Jianxi Luo from SUTD 3 . The comparison metrics include Mean Squared Error (MSE), Mean Absolute Percentage Error
(MAPE) and Coefficient of determination (R2 ). Table 2 shows the model predictions of the spread of the COVID-19
for every major country for which sufficient data was available and model fits had R2 > 0.5 using the proposed model.
As shown in the table, the proposed model performs significantly better than the baseline.
4
MSE R2 MAPE
Country
Weibull Gaussian Beta 4 Fisher-Tippet Log Normal Weibull Gaussian Beta 4 Fisher-Tippet Log Normal Weibull Gaussian Beta 4 Fisher-Tippet Log Normal
World 2.41E+07 3.78E+07 2.99E+07 2.89E+07 2.99E+07 0.98 0.97 0.98 0.97 0.97 49.14 49.14 50.39 48.12 46.19
India 6.97E+03 7.09E+03 6.89E+03 6.89E+03 7.00E+03 0.97 0.97 0.98 0.97 0.97 18.33 18.33 18.49 21.69 20.69
United States 8.37E+06 1.11E+07 8.63E+05 9.47E+06 9.78E+06 0.95 0.93 0.94 0.93 0.94 24.33 24.33 40.23 71.64 111.63
United Kingdom 2.00E+05 2.22E+05 2.12E+05 2.02E+05 2.07E+05 0.95 0.95 0.95 0.95 0.95 21.46 21.46 20.43 21.52 17.42
Italy 1.56E+05 3.38E+05 2.10E+05 2.09E+05 2.35E+05 0.96 0.92 0.95 0.95 0.94 14.98 14.98 20.00 19.62 170.63
Table 1: Preliminary Evaluation of different models. We observe that iterative fitting of Inverse Weibull performs
significantly better than iterative fitting of other distributions like Gaussian, Beta (4-parameter), Fisher-Tippet (Extreme
Value distribution), and Log Normal. The lowest value of MSE/MAPE and highest values of R2 among all distributions
are shown in bold.
Algorithm 1 Robust Curve Fitting using Iterative weighting

Require:
x : Input sequence of days from first case
y : Number of cases for each day in x
: Threshold parameter
procedure ROBUST C URVE F ITTING
w0 ← Unit vector [1] × size(x)
for iteration n from 0, step 1 do
f ← LM(input = x, target = y, weights = wn )
di ← |f (xi ) − yi | ∀ i
dn −tanh(dn )

exp 1− maxi dn −tanh(di
n)
win+1 ← P i i
i i
i
dn −tanh(dn )

i exp 1− max d −tanh(dn )
n
i i i
n n+1
P
i |wi − wi
if | < then
break
end for
end procedure
As shown in Figure 44 , the predictions of the baseline Gaussian model deployed by SUTD are overoptimistic. Following
such models could lead to premature uplifting of the lockdown, causing adverse effect on management of the epidemic.
Having better fit models, as proposed here, could help plan a better strategy, based on more accurate predictions and
future scenarios.
Figure 5 shows the total predicted number of cases for all countries across the globe. Here we have neglected those
countries where the data is insufficient for making predictions, or the number of days for data is less than 30. As shown
in Figure 4 explained in model section, the fit curve can be used to predict the number of cases that will have to be dealt
by the country, assuming the same trend continues. The figure illustrates that the maximum number of total cases will
be in the North America region. The number of cases will also be high in the European continent, Russia and eastern
Asia, including China, the original epicenter of the disease.
50000
100000 Robust Weibull Prediction Robust Weibull Prediction
Gaussian Prediction Gaussian Prediction
80000 40000
Actual Data Actual Data
New Cases
New Cases
60000 30000
40000 20000 (Gaussian)

(Gaussian)
97% of Total
97% of Total Predicted cases
Predicted cases 97% of Total 97% of Total
20000 Predicted cases 10000 on 19 May 2020
on 25 May 2020 Predicted cases
on 11 Oct 2020 on 14 Aug 2020
0 0
31
30
29
30
29
29
28
27
26
31
30
29
30
29
29
28
27
l2
l2
ec
ar
pr
ay
ug
ec
ar
pr
ay
ug
Ju
Ju
Ja
Fe
Ju
Se
Ja
Fe
Ju
D
M
A
A
M
M
A
Date Date
(i) World (ii) United States

Figure 4: Comparison of predicted dates to reach 97% of the total expected cases by baseline Gaussian and proposed
Robust Weibull models. The predicted end date of the pandemic in the baseline model are over-optimistic.
4
Curves and predictions of all countries have been given in Appendix
5
Predictions of Robust Weibull Model Fit comparison metrics

Country
Total Cases Date of last case 97% cases date MSE (W) MSE (G) R2 (W) R2 (G) MAPE (W) MAPE (G)
United States 1,937,724 11-Feb-22 14-Aug-20 9.32E+06 1.33E+07 0.95 0.92 26.58 1568.56
Russia 529,687 27-Nov-21 26-Sep-20 5.50E+04 5.92E+04 0.99 0.98 24.53 75.91
India 409,418 29-Oct-24 13-Aug-21 8.40E+03 9.11E+03 0.97 0.97 22.38 80.47
United Kingdom 331,124 31-Jul-21 18-Aug-20 2.54E+05 3.19E+05 0.95 0.93 20.14 211.72
Ukraine 254,087 10-Dec-41 18-Jan-31 3.31E+04 3.37E+04 0.53 0.52 1842.80 2079.70
Italy 253,022 7-Mar-21 27-Jun-20 1.52E+05 3.55E+05 0.96 0.91 14.55 1577.95
Spain 236,737 30-Sep-20 20-Apr-20 4.67E+05 6.59E+05 0.93 0.90 3682.04 2917.90
Turkey 234,218 22-Jun-23 30-Dec-20 2.27E+05 1.49E+05 0.92 0.94 30.95 555.87
Germany 181,369 17-Oct-20 2-May-20 3.39E+05 4.50E+05 0.91 0.88 1013.65 582.89
France 147,795 11-Oct-20 29-May-20 3.93E+05 4.14E+05 0.84 0.83 32.36 134.36
Qatar 143,779 1-Oct-22 18-Mar-21 3.71E+03 3.46E+03 0.93 0.93 99.14 90.02
Canada 139,331 7-Dec-21 31-Oct-20 2.12E+04 2.64E+04 0.95 0.94 28.19 210.56
Belarus 135,375 4-Jun-22 2-Feb-21 1.28E+04 1.28E+04 0.83 0.83 1040.39 1101.48
Iran 126,048 12-Mar-21 15-Jul-20 2.11E+05 1.88E+05 0.78 0.80 1847.80 2313.85
China 84,171 6-Jul-20 27-Mar-20 1.40E+06 1.28E+06 0.48 0.53 114.04 202.88
Sweden 68,671 25-Apr-22 15-Feb-21 5.33E+03 5.45E+03 0.91 0.91 20.55 151.04
Belgium 65,257 19-Nov-20 27-Jun-20 3.88E+04 4.10E+04 0.88 0.88 18.76 134.34
Bangladesh 53,127 19-Apr-22 22-Feb-21 1.38E+03 1.60E+03 0.96 0.96 30.89 118.80
Netherlands 53,057 28-Nov-20 2-Jul-20 1.07E+04 1.10E+04 0.94 0.94 16.45 140.98
United Arab Emirates 46,395 18-Jul-21 5-Nov-20 3.30E+03 3.49E+03 0.91 0.90 840.72 947.02
Portugal 37,302 7-Jun-21 12-Sep-20 2.62E+04 3.20E+04 0.75 0.70 41.46 222.40
Indonesia 35,581 19-Sep-21 20-Dec-20 1.24E+03 1.22E+03 0.93 0.93 51.96 124.06
Poland 35,113 22-Nov-22 1-Aug-21 3.08E+03 3.42E+03 0.87 0.86 29.90 110.45
Switzerland 31,407 26-Jul-20 13-May-20 1.14E+04 1.39E+04 0.92 0.90 383.82 476.28
Bahrain 30,258 21-Mar-23 12-Jan-22 1.05E+03 1.04E+03 0.57 0.57 98.32 102.20
Ireland 27,694 6-Sep-20 12-Jun-20 1.17E+04 9.49E+03 0.84 0.87 25.91 21.83
Singapore 24,088 19-Jul-20 28-May-20 1.68E+04 1.69E+04 0.82 0.82 912.31 1018.38
Dominican Republic 22,193 19-Jun-21 29-Apr-20 1.79E+03 1.85E+03 0.81 0.81 304.96 420.08
Romania 22,102 24-Dec-20 10-Aug-20 1.98E+03 2.29E+03 0.91 0.90 16.83 87.83
Algeria 19,188 6-Feb-22 16-May-21 3.59E+02 3.96E+02 0.86 0.85 61.43 147.61
Israel 18,167 3-Aug-20 26-May-20 8.81E+03 1.03E+04 0.80 0.77 37.91 137.87
Japan 17,614 27-Jul-20 29-May-20 1.12E+04 1.08E+04 0.74 0.75 162.70 202.00
Morocco 16,972 24-May-22 2-Aug-21 1.60E+03 1.40E+03 0.69 0.73 188.07 171.78
Serbia 16,426 24-Jan-21 27-Aug-20 2.49E+03 2.36E+03 0.87 0.87 210.28 229.10
Austria 15,781 9-Jun-20 30-Apr-20 4.07E+03 5.33E+03 0.92 0.89 23.08 34.08
Philippines 14,371 24-Nov-20 2-Apr-20 5.08E+03 5.49E+03 0.65 0.62 543.57 698.02
Denmark 13,282 26-Oct-20 17-Jul-20 1.94E+03 1.81E+03 0.81 0.82 18.95 104.47
Moldova 12,818 6-Feb-22 12-Jun-21 8.78E+02 9.57E+02 0.75 0.73 36.51 68.69
Hungary 11,077 19-Jul-22 22-Nov-21 5.85E+02 5.66E+02 0.64 0.65 49.55 71.00
South Korea 10,780 4-May-20 2-Apr-20 3.35E+03 3.88E+03 0.87 0.85 55.81 68.84
Finland 9,158 21-Dec-20 5-Sep-20 9.09E+02 9.11E+02 0.74 0.74 125.43 188.74
Norway 8,534 23-Jul-20 19-Apr-20 1.73E+03 1.79E+03 0.80 0.79 187.65 211.88
Czech Republic 8,528 14-Jul-20 22-May-20 1.34E+03 1.56E+03 0.85 0.83 20.11 59.31
Malaysia 7,080 6-Aug-20 7-Jun-20 4.88E+02 5.73E+02 0.89 0.87 30.30 112.20
Australia 6,797 17-May-20 21-Apr-20 2.54E+03 2.78E+03 0.81 0.79 31.85 36.77
Oman 4,871 4-Sep-20 23-Apr-20 6.01E+02 5.98E+02 0.66 0.66 229.07 232.19
Iraq 4,113 20-Nov-20 20-Apr-20 4.99E+02 5.21E+02 0.47 0.45 299.89 354.05
Luxembourg 3,887 29-May-20 2-May-20 5.15E+02 6.64E+02 0.83 0.79 49.42 127.81
Thailand 3,044 30-May-20 2-Apr-20 8.51E+02 9.01E+02 0.63 0.61 381.04 399.02
Greece 2,944 7-Jul-20 28-Apr-20 3.69E+02 3.67E+02 0.66 0.66 137.87 127.28
Croatia 2,275 15-Jun-20 18-May-20 8.44E+01 1.04E+02 0.88 0.85 20.64 45.14
World 6,734,075 29-Jan-24 11-Oct-20 2.91E+07 4.92E+07 0.98 0.96 47.53 63.36
Table 2: Predictions and error comparisons. Country wise predictions using Robust Weibull model and error
comparison between Robust Weibull and baseline Gaussian Model. We predict the total number of cases that will be
reached, and the last case date i.e. when the model predicts new cases < 1. We also predict the date when the total
number will reach 97% of the total expected cases. Such data is critical to prepare the healthcare services in advance.
The fit comparison metrics (with proposed model as W and baseline model as G) show that Mean Square Error (MSE)
and the Mean Absolute Percentage Error (MAPE) of the proposed model are lower than baseline for most cases. The
coefficient of determination (R2 ) is higher for the proposed model for most of the countries. The least MSE/MAPE and
highest R2 values among the two models are shown in bold. Data upto 4 May, 2020 was used to create these results.
6
Figure 5: Global heat-map for total predicted cases for different countries as on May 4, 2020 (countries with insufficient data for prediction are shown in white)
7
Research Opportunities and Emerging Trends

Risk Contact-less Treatment and Predict the structure of Analyze Social Media Predict the Symptoms Find the Movement Efficient Data Climate
Assessment Delivery using AI based Robots Proteins of COVID-19 Data using AI from Visual Imaging Patterns of the People Processing Change
Figure 6: Future Research Directions and Open Challenges

One more future direction by Reviewer 1
3 Conclusions and
There is a need Future
to explore Directions
social media and applying machine learning application to analyse social
media data.
In this study, we have discussed how improved mathematical modelling, Machine Learning and cloud computing can
help to predict the growth of the epidemic proactively. Further, a case study has been presented which shows the severity
of the spread of CoV-2 in countries worldwide. Using the proposed Robust Weibull model based on iterative weighting,
we show that our model is able to make statistically better predictions than the baseline. The baseline Gaussian model
shows an over-optimistic picture of the COVID-19 scenario. A poorly fitting model could lead to a non optimal decision
making, leading to worsening of public health situation.
We propose the future directions as follows. Firstly, other important parameters like population density, distribution of
age, individual and community movements, level of healthcare facilities available, strain type and virulence of the virus
etc., need to be included in the regression model to further enhance the prediction accuracy. Secondly, models like
ARIMA [27] can be integrated with Weibull function for further time series analysis and predictions. Thirdly, ML can
be utilized to predict the structure and function of various proteins associated with CoV-2 and their interaction with the
host human proteins and cellular environment. The contribution of various socio-economic variables that determine the
vulnerability, spread and progression of the epidemic can be predicted by developing suitable algorithms. AI based
proactive measures can be taken to prevent the spread of the virus to sensitive groups in the society. Real time sensors
can be used, for example in traffic camera or surveillance, which track COVID-19 symptoms based on visual imaging
and tracking Apps, and inform respective hospitals and administrative authorities for punitive action [28]. Tracking
needs to cover all stages from ports of entries to public places and hospitals [29]. The research directions and challenges
are summarized in Figure 6.
4 Software Availability
Our prediction model is available online at https://github.com/shreshthtuli/covid-19-prediction. The
dataset used for this work is the Our World Dataset, available at https://github.com/owid/covid-19-data/
tree/master/public/data/. Few interactive graphs can be seen at https://collaboration.coraltele.com/
covid/.
References
[1] Chen Wang, Peter W Horby, Frederick G Hayden, and George F Gao. A novel coronavirus outbreak of global
health concern. The Lancet, 395(10223):470–473, 2020.
[2] Coronavirus - worldometer, link: https://www.worldometers.info/coronavirus/, [online accessed] 12
april 2020.
[3] Guangdi Li and Erik De Clercq. Therapeutic options for the 2019 novel coronavirus (2019-ncov), 2020.
[4] Smriti Mallapaty. What the cruise-ship outbreaks reveal about covid-19. Nature, 580(7801):18–18, 2020.
[5] Kai Liu, Ying Chen, Ruzheng Lin, and Kunyuan Han. Clinical features of covid-19 in elderly patients: A
comparison with young and middle-aged patients. Journal of Infection, 2020.
[6] Shi Zhao, Qianyin Lin, Jinjun Ran, Salihu S Musa, Guangpu Yang, Weiming Wang, Yijun Lou, Daozhou Gao, Lin
Yang, Daihai He, et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-ncov)
in china, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak. International Journal of
Infectious Diseases, 92:214–217, 2020.
[7] Shreshth Tuli, Shikhar Tuli, Gurleen Wander, Praneet Wander, Sukhpal Singh Gill, Schahram Dustdar, Rizos
Sakellariou, and Omer Rana. Next generation technologies for smart healthcare: Challenges, vision, model, trends
and future directions. Internet Technology Letters, page e145.
[8] Chaolin Huang, Yeming Wang, Xingwang Li, Lili Ren, Jianping Zhao, Yi Hu, Li Zhang, Guohui Fan, Jiuyang
Xu, Xiaoying Gu, et al. Clinical features of patients infected with 2019 novel coronavirus in wuhan, china. The
Lancet, 395(10223):497–506, 2020.
8
[9] Adrien Depeursinge, Anne S Chin, Ann N Leung, Donato Terrone, Michael Bristow, Glenn Rosen, and Daniel L
Rubin. Automated classification of usual interstitial pneumonia using regional volumetric texture analysis in
high-resolution ct. Investigative radiology, 50(4):261, 2015.
[10] Maxwell W Libbrecht and William Stafford Noble. Machine learning applications in genetics and genomics.
Nature Reviews Genetics, 16(6):321–332, 2015.
[11] Shreshth Tuli, Nipam Basumatary, Sukhpal Singh Gill, Mohsen Kahani, Rajesh Chand Arya, Gurpreet Singh
Wander, and Rajkumar Buyya. Healthfog: An ensemble deep learning based smart healthcare system for automatic
diagnosis of heart diseases in integrated iot and fog computing environments. Future Generation Computer
Systems, 104:187–200, 2020.
[12] Sukhpal Singh Gill, Shreshth Tuli, Minxian Xu, Inderpreet Singh, Karan Vijay Singh, Dominic Lindsay, Shikhar
Tuli, Daria Smirnova, Manmeet Singh, Udit Jain, et al. Transformative effects of iot, blockchain and artificial
intelligence on cloud computing: Evolution, vision, trends and open challenges. Internet of Things, 8:100118,
2019.
[13] Shreshth Tuli, Redowan Mahmud, Shikhar Tuli, and Rajkumar Buyya. Fogbus: A blockchain-based lightweight
framework for edge and fog computing. Journal of Systems and Software, 2019.
[14] Emily Chen, Kristina Lerman, and Emilio Ferrara. Covid-19: The first public coronavirus twitter dataset. arXiv
preprint arXiv:2003.07372, 2020.
[15] Joseph Paul Cohen, Paul Morrison, and Lan Dao. Covid-19 image data collection. arXiv preprint
arXiv:2003.11597, 2020.
[16] Yan Bai, Lingsheng Yao, Tao Wei, Fei Tian, Dong-Yan Jin, Lijuan Chen, and Meiyun Wang. Presumed
asymptomatic carrier transmission of covid-19. Jama, 2020.
[17] Benjamin F Maier and Dirk Brockmann. Effective containment explains sub-exponential growth in confirmed
cases of recent covid-19 outbreak in mainland china. arXiv preprint arXiv:2002.07572, 2020.
[18] Yi Li, Meng Liang, Xianhong Yin, Xiaoyu Liu, Meng Hao, Zixin Hu, Yi Wang, and Li Jin. Covid-19 epidemic
outside china: 34 founders and exponential growth. medRxiv, 2020.
[19] Mary Raygoza. Covid-19, exponential growth, and the power of showing up in social solidarity: The math behind
the virus. 2020.
[20] Yanping Bai and Zhen Jin. Prediction of sars epidemic by bp neural networks with online prediction strategy.
Chaos, Solitons & Fractals, 26(2):559–569, 2005.
[21] Ying-Hen Hsieh, Jen-Yu Lee, and Hsiao-Ling Chang. Sars epidemiology modeling. Emerging infectious diseases,
10(6):1165, 2004.
[22] Dejian Lai. Monitoring the sars epidemic in china: a time series analysis. Journal of Data Science, 3(3):279–293,
2005.
[23] Wendi Wang and Shigui Ruan. Simulating the sars outbreak in beijing with limited data. Journal of theoretical
biology, 227(3):369–379, 2004.
[24] David Smith and Lang Moore. The sir model for spread of disease: The differential equation model.
Loci.(originally Convergence.) https://www. maa. org/press/periodicals/loci/joma/the-sir-model-for-spread-of-
disease-the-differential-equation-model, 2004.
[25] Felipe RS De Gusmao, Edwin MM Ortega, and Gauss M Cordeiro. The generalized inverse weibull distribution.
Statistical Papers, 52(3):591–619, 2011.
[26] Jorge J Moré. The levenberg-marquardt algorithm: implementation and theory. In Numerical analysis, pages
105–116. Springer, 1978.
[27] Domenico Benvenuto, Marta Giovanetti, Lazzaro Vassallo, Silvia Angeletti, and Massimo Ciccozzi. Application
of the arima model on the covid-2019 epidemic dataset. Data in brief, page 105340, 2020.
[28] Halgurd S Maghdid, Kayhan Zrar Ghafoor, Ali Safaa Sadiq, Kevin Curran, and Khaled Rabie. A novel ai-enabled
framework to diagnose coronavirus covid 19 using smartphone embedded sensors: Design study. arXiv preprint
arXiv:2003.07434, 2020.
[29] Shuai Wang, Bo Kang, Jinlu Ma, Xianjun Zeng, Mingming Xiao, Jia Guo, Mengjiao Cai, Jingyi Yang, Yaodong
Li, Xiangfei Meng, et al. A deep learning algorithm using ct images to screen for corona virus disease (covid-19).
medRxiv, 2020.
9
Appendix: Real data from WHO with predicted curves of proposed and baseline models
All data uptil 4 May 2020 has been used to generate results.
50000
80000 40000
New Cases
New Cases
60000 30000
40000 20000
97% of Total 97% of Total

20000 Predicted cases 10000
Predicted cases
on 11 Oct 2020 on 14 Aug 2020
0 0
31
30
29
30
29
29
28
27
26
31
30
29
30
29
29
28
27
l2
l2
ec
ar
pr
ay
ug
ec
ar
pr
ay
ug
Ju
Ju
Ja
Fe
Ju
Se
Ja
Fe
Ju
D
M
A
A
M
M
A
A
Date Date
(i) World (ii) United States

97% of Total
Predicted cases Robust Weibull Prediction
on 20 Apr 2020 6000
Robust Weibull Prediction Gaussian Prediction
8000 Gaussian Prediction 5000 Actual Data
New Cases
Actual Data
New Cases
6000 4000
3000
4000
2000
97% of Total
2000 Predicted cases
1000 on 27 Jun 2020
0 0
31
30
29
30
29
29
28
27
31
30
29
30
29
29
28
27
l2
l2
ec
ar
pr
ay
ug
ec
ar
pr
ay
ug
Ju
Ju
Ja
Fe
Ju
Ja
Fe
Ju
D
M
A
A
M
M
A
A
Date Date
(iii) Spain (iv) Italy

Robust Weibull Prediction 6000 Robust Weibull Prediction
8000
Actual Data 5000 Actual Data
6000 97% of Total
New Cases
New Cases
4000 Predicted cases

on 02 May 2020
4000 3000
2000
2000 97% of Total
Predicted cases 1000
on 18 Aug 2020
0 0
31
30
29
30
29
29
28
27
31
30
29
30
29
29
28
27
l2
l2
ec
ar
pr
ay
ug
ec
ar
pr
ay
ug
Ju
Ju
Ja
Fe
Ju
Ja
Fe
Ju
D
M
A
A
M
M
A
A
Date Date
(v) United Kingdom (vi) Germany

8000
Robust Weibull Prediction Robust Weibull Prediction
6000 Actual Data 6000 Actual Data
New Cases
New Cases
4000 4000
2000 97% of Total 2000 97% of Total

Predicted cases Predicted cases
on 29 May 2020 on 26 Sep 2020
0 0
31
30
29
30
29
29
28
27
26
31
30
29
30
29
29
28
27
l2
l2
ec
ar
pr
ay
ug
p
ec
ar
pr
ay
ug
Ju
Ju
Ja
Fe
Ju
Se
Ja
Fe
Ju
M
D
A
A
M
M
A
A
Date Date
(vii) France (viii) Russia

Figure 7: New cases for different countries (continued on next page)
10

5000
12500
4000 Actual Data Actual Data
New Cases
New Cases
10000
3000
7500
2000
5000
1000 97% of Total

2500 97% of Total
Predicted cases
Predicted cases
on 15 Jul 2020
on 27 Mar 2020
0 0
31
30
29
30
29
29
28
27
31
30
29
30
29
29
28
27
l2
l2
ec
ar
pr
ay
ug
ec
ar
pr
ay
ug
Ju
Ju
Ja
Fe
Ju
Ja
Fe
Ju
D
M
A
A
M
M
A
A
Date Date
(ix) Iran (x) China

2000 2500
2000
New Cases
New Cases
1500
1000
1000
500 97% of Total 97% of Total

Predicted cases 500 Predicted cases
on 31 Oct 2020 on 27 Jun 2020
0 0
31
30
29
30
29
29
28
27
26
26
31
30
29
30
29
29
28
27
l2
l2
ec
ar
pr
ay
ug
ct
ec
ar
pr
ay
ug
Ju
Ju
Ja
Fe
Ju
Se
Ja
Fe
Ju
O
D
M
A
A
M
M
A
A
Date Date
(xi) Canada (xii) Belgium

1250
1000
New Cases
New Cases
750 1500
500 1000
97% of Total
97% of Total
250 Predicted cases 500 Predicted cases
on 02 Jul 2020
on 13 Aug 2021
0 0
Ja 31
Fe 30
M 29
A 30
M 29
Ju 29
Ju 28
A 28
Se 27
O 26
N 26
D 25
Ja 25
Fe 24
M 23
A 25
M 24
Ju 24
Ju 23
A 23
22
31
30
29
30
29
29
28
27
l2
ec
n
b
ar
pr
ay
n
l
ug
p
ct
ov
ec
n
b
ar
pr
ay
n
l
ug
ec
ar
pr
ay
ug
Ju
Ja
Fe
Ju
D
D
Date Date
(xiii) Netherlands (xiv) India

1500 Robust Weibull Prediction
Robust Weibull Prediction
1250 Gaussian Prediction
Gaussian Prediction 1250
1000
New Cases
1000
New Cases
750
750
500 500
97% of Total
Predicted cases 97% of Total
250 on 13 May 2020 250
Predicted cases
on 12 Sep 2020
0 0
01
31
30
30
29
29
28
31
30
29
30
29
29
28
27
l2
ar
ar
pr
ay
ug
ec
ar
pr
ay
ug
Ju
Ju
Ju
Ja
Fe
Ju
M
D
A
A
M
M
A
A
Date Date
(xv) Switzerland (xvi) Portugal

800 1200
Gaussian Prediction 1000 Gaussian Prediction
800
New Cases
New Cases
400 600
400
200 97% of Total
97% of Total Predicted cases
Predicted cases 200
on 12 Jun 2020
on 15 Feb 2021
0 0
31
30
29
30
M 9
29
28
27
26
26
25
25
24
23
31
30
29
30
29
29
28
27
2
l2
l2
ec
ar
pr
ay
ug
ct
ov
ec
ec
ar
pr
ay
ug
Ju
Ju
Ja
Fe
Ju
Se
Ja
Fe
Ja
Fe
Ju
O
D
M
A
A
N
M
A
Date Date
(xvii) Sweden (xviii) Ireland

Figure 7: New cases for different countries (continued on next page)
11
1500 1200
1250
800
New Cases
New Cases
1000
750 600
500 400
97% of Total
Predicted cases 97% of Total
250 on 28 May 2020 200 Predicted cases
on 26 May 2020
0 0
31
30
29
30
29
29
28
27
31
30
29
30
29
29
28
27
l2
l2
ec
ar
pr
ay
ug
ec
ar
pr
ay
ug
Ju
Ju
Ja
Fe
Ju
Ja
Fe
Ju
D
M
A
A
M
M
A
A
Date Date
(xix) Singapore (xx) Israel

500
New Cases
New Cases
400 600
300
400
200
97% of Total
Predicted cases 200 97% of Total
100
on 21 Apr 2020 Predicted cases
on 02 Apr 2020
0 0
31
30
29
30
29
29
28
27
31
30
29
30
29
29
28
27
l2
l2
ec
ar
pr
ay
ug
ec
ar
pr
ay
ug
Ju
Ju
Ja
Fe
Ju
Ja
Fe
Ju
D
M
A
A
M
M
A
A
Date Date
(xxi) Australia (xxii) South Korea

400
500 Actual Data 300 Actual Data
New Cases
New Cases
400
200
300
200
97% of Total 100 97% of Total
100 Predicted cases Predicted cases
on 22 Feb 2021 on 17 Jul 2020
0 0
04
03
03
02
01
31
30
30
29
29
28
27
31
30
29
30
29
29
28
27
l0
l2
ar
pr
ay
ug
ug
ct
ov
ec
ec
ar
pr
ay
ug
Ju
Ju
Ju
Se
Ja
Fe
Ja
Fe
Ju
O
M
M
A
A
M
M
A
A
Date Date
(xxiii) Bangladesh (xxiv) Denmark

400
Actual Data 97% of Total
Actual Data
300 Predicted cases
300
New Cases
New Cases
on 19 Apr 2020
200 200
100 97% of Total 100

Predicted cases
on 20 Dec 2020
0 0
31
30
29
30
29
29
28
27
26
26
25
25
31
30
29
30
29
29
28
27
l2
l2
ec
ar
pr
ay
ug
ct
ov
ec
ec
ar
pr
ay
ug
Ju
Ju
Ja
Fe
Ju
Se
Ja
Fe
Ju
O
D
M
A
A
M
M
A
Date Date
(xxv) Indonesia (xxvi) Norway

125
200 Gaussian Prediction Gaussian Prediction
New Cases
New Cases
150
75
100
50
97% of Total
50 Predicted cases 97% of Total
25 Predicted cases
on 02 May 2020
on 02 Jun 2020
0 0
31
30
29
30
29
29
28
27
31
30
29
30
29
29
28
27
l2
l2
ec
ar
pr
ay
ug
ec
ar
pr
ay
ug
Ju
Ju
Ja
Fe
Ju
Ja
Fe
Ju
D
M
A
A
M
M
A
Date Date
(xxvii) Luxembourg (xxviii) Estonia

Figure 7: New cases for different countries
12
Author’s Biography
Shreshth Tuli is an undergraduate student at the Department of Computer Science and Engi-
neering at Indian Institute of Technology - Delhi, India. He is also a co-founder of Qubit Inc.
company which works on providing next generation solutions for industrial problems. He is a
national level Kishore Vaigyanic Protsahan Yojana (KVPY) scholarship holder for excellence
in science and innovation. He has worked as a visiting researcher at the Cloud Computing
and Distributed Systems (CLOUDS) Laboratory, Department of Computing and Information
Systems, the University of Melbourne, Australia. His research interests include Internet of
Things (IoT), Fog Computing, Network Design, and Artificial Intelligence.
Shikhar Tuli is an undergraduate student at the Department of Electrical Engineering at Indian

Institute of Technology - Delhi, India. He is the founder and CEO of Qubit Inc. He has worked
remotely with the Cloud Computing and Distributed Systems (CLOUDS) Laboratory, Depart-
ment of Computing and Information Systems, the University of Melbourne, Australia in the
realization of the FogBus framework. He has also worked at the Embedded Systems Laboratory,
EPFL, Switzerland in the design of low-power and physics-optimized Edge devices made from
emerging Non-Volatile Memories. His research interests include Internet of Things (IoT),
In-memory and Neuromorphic computing architectures and Nanoelectronics. He specializes in
designing novel hardware technologies that are valuable to both industry and academia.
Rakesh Tuli is Senior Research Advisor; J C Bose National Fellow, UIET, Panjab University,
Chandigarh. Before this, he was executive Director, National Agri-Food Biotech Institute, Mo-
hali; and Director, National Botanical Research Institute, Lucknow. He has many publications
in reputed journals and conferences including Nature Biotechnology. His research includes Ge-
nomics and Transgenic Approaches to Improving Plants for Agricultural and Health/Medicinal
Applications.
Sukhpal Singh Gill is a Lecturer (Assistant Professor) in Cloud Computing at School of

Electronic Engineering and Computer Science, Queen Mary University of London, UK. Prior
to this, Dr. Gill has held positions as a Research Associate at the School of Computing and
Communications, Lancaster University, UK and also as a Postdoctoral Research Fellow at
CLOUDS Laboratory, The University of Melbourne, Australia. His research interests include
Cloud Computing, Fog Computing, Software Engineering, Internet of Things and Big Data.
13

P G T COVID-19 P M L C C: Redicting The Rowth and Rend of Andemic Using Achine Earning and Loud Omputing

Uploaded by

Copyright:

Available Formats

P G T COVID-19 P M L C C: Redicting The Rowth and Rend of Andemic Using Achine Earning and Loud Omputing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

P G T COVID-19 P M L C C: Redicting The Rowth and Rend of Andemic Using Achine Earning and Loud Omputing

Uploaded by

Copyright:

Available Formats

medRxiv preprint doi: https://doi.org/10.1101/2020.05.06.20091900; this version posted May 11, 2020.

The copyright holder for this preprint

P REDICTING THE G ROWTH AND T REND OF COVID-19

Shreshth Tuli Shikhar Tuli

Rakesh Tuli Sukhpal Singh Gill

Keywords COVID-19; Machine Learning; SARS-CoV-2; Coronavirus; Prediction; Cloud Computing

P REDICTING THE IMPACT AND END OF COVID-19 PANDEMIC - M AY 4, 2020

2 Prediction Model and Performance Comparison

P REDICTING THE IMPACT AND END OF COVID-19 PANDEMIC - M AY 4, 2020

AI Analytics Prediction COVID

Gateway Private Government

Figure 1: Proposed Cloud based AI framework for COVID-19 related analytics.

P REDICTING THE IMPACT AND END OF COVID-19 PANDEMIC - M AY 4, 2020

Figure 3: Iterative weighting technique for robust curve fitting.

P REDICTING THE IMPACT AND END OF COVID-19 PANDEMIC - M AY 4, 2020

Algorithm 1 Robust Curve Fitting using Iterative weighting

40000 20000 (Gaussian)

(i) World (ii) United States

P REDICTING THE IMPACT AND END OF COVID-19 PANDEMIC - M AY 4, 2020

Predictions of Robust Weibull Model Fit comparison metrics

P REDICTING THE IMPACT AND END OF COVID-19 PANDEMIC - M AY 4, 2020

Research Opportunities and Emerging Trends

Figure 6: Future Research Directions and Open Challenges

P REDICTING THE IMPACT AND END OF COVID-19 PANDEMIC - M AY 4, 2020

P REDICTING THE IMPACT AND END OF COVID-19 PANDEMIC - M AY 4, 2020

97% of Total 97% of Total

(i) World (ii) United States

(iii) Spain (iv) Italy

4000 Predicted cases

(v) United Kingdom (vi) Germany

2000 97% of Total 2000 97% of Total

(vii) France (viii) Russia

P REDICTING THE IMPACT AND END OF COVID-19 PANDEMIC - M AY 4, 2020

Robust Weibull Prediction 15000 Robust Weibull Prediction

1000 97% of Total

(ix) Iran (x) China

500 97% of Total 97% of Total

(xi) Canada (xii) Belgium

(xiii) Netherlands (xiv) India

(xv) Switzerland (xvi) Portugal

(xvii) Sweden (xviii) Ireland

P REDICTING THE IMPACT AND END OF COVID-19 PANDEMIC - M AY 4, 2020

(xix) Singapore (xx) Israel

(xxi) Australia (xxii) South Korea

(xxiii) Bangladesh (xxiv) Denmark

100 97% of Total 100

(xxv) Indonesia (xxvi) Norway

(xxvii) Luxembourg (xxviii) Estonia

P REDICTING THE IMPACT AND END OF COVID-19 PANDEMIC - M AY 4, 2020

Shikhar Tuli is an undergraduate student at the Department of Electrical Engineering at Indian

Sukhpal Singh Gill is a Lecturer (Assistant Professor) in Cloud Computing at School of

You might also like