0% found this document useful (0 votes)
238 views

Data Analysis and Visualization of COVID-19 Epidemic Based On Python

The new coronavirus pneumonia (COVID-19) that broke out at the end of 2019 was designated by the World Health Organization (WHO
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
238 views

Data Analysis and Visualization of COVID-19 Epidemic Based On Python

The new coronavirus pneumonia (COVID-19) that broke out at the end of 2019 was designated by the World Health Organization (WHO
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Volume 6, Issue 11, November – 2021 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Data Analysis and Visualization of COVID-19


Epidemic based on Python
Weiyi Ma Dongmei Zhang*
School of Computer Science and Technology School of Computer Science and Technology
Shandong University of Technology Shandong University of Technology
Zibo City, Shandong Province, China Zibo City, Shandong Province, China

Abstract:- The new coronavirus pneumonia (COVID-19) COVID-19 epidemic, and is proposed based on this
that broke out at the end of 2019 was designated by the background.
World Health Organization (WHO) as an "emergency
public health event of international concern." In the b) Meaning
process of epidemic prevention and control, big data and The new crown epidemic spread rapidly to more than 200
Internet technology have played an important role in the countries and regions around the world within a few months,
collection, analysis, and release of epidemic data. The and as of the beginning of June, there have been more than 6
purpose of the project is to implement a Python-based data million patients. China has achieved full control of the
analysis and visualization program for the COVID-19 epidemic in May, and a number of response measures are
epidemic. The thesis displays the epidemic situation and worthy of promotion. In the information age, with the help of
transmission characteristics of the existing data through a big data and artificial intelligence technology, it is possible to
visualization scheme, establishes a dynamic model of quickly establish an effective system and mechanism for
infectious diseases, evaluates the prevention and control responding to public health emergencies. Its intuitive and
measures of the epidemic situation, and makes effective data analysis methods and artificial intelligence
recommendations and early warnings. In addition, to a visualization methods have played a pivotal role.
certain extent, it can predict the trend of epidemic diseases
and provide reference for epidemic prevention and control The topic is based on prediction models such as SEIR,
decisions and public behavior. taking the data of COVID-19 epidemic in Hubei Province as
an example, preliminary analysis of the general law of
Keywords:- Novel coronavirus pneumonia; COVID-19; COVID-19 epidemic, and a prediction analysis. From the
Python; data analysis; data visualization. perspective of the prevention and control process of the new
crown epidemic, studying the occurrence, development, and
I. INTRODUCTION evolution of the epidemic from a macro perspective, and
predicting and analyzing it based on big data, are of great
A. Subject Background and Significance
significance to the strategic decision-making of large-scale
a) Background
prevention and control of infectious diseases and maintaining
Pneumonia caused by a new type of coronavirus was
social order and stability.
discovered in Wuhan, Hubei in December 2019, and it is
showing a trend of rapid spread. On February 7, 2020, the B. Current Research Status at Home and Abroad
National Health Commission (PRC) named it "New In the early stage of the outbreak, many scholars tried to
Coronavirus Pneumonia", or "New Coronary Pneumonia" for study and analyze the development trend of the new crown
short. On February 11, 2020, the International Commission for epidemic through the infectious disease dynamic model.
Classification of Viruses (ICTV) named the virus Severe
Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2); In January of this year, many scholars predicted the
on the same day, the World Health Organization (WHO) The epidemic. Wu, a scholar from Hong Kong, China, used the
disease it caused was named Coronavirus Disease 2019 number of people infected before January 28 to calculate the
(COVID-19). In 2020, the new crown pneumonia epidemic trend of the epidemic in Wuhan. They predict that the number
has broken out one after another around the world, which has of infections on January 25 will exceed 6,000. Professor Shen
extremely serious impacts on the global economy and society, and others from Xi’an Jiaotong University estimated that the
and has caused great troubles to human health and life [1]. number of SARS-CoV-2 infections will not exceed SARS-
CoV-2 based on the existing epidemiological data and
In various aspects of epidemic prevention and control, infectious disease dynamic models, and with reference to
apart from the effective measure of isolation, scientific SARS and other coronaviruses. 20,000 people, but this is
popularization of daily epidemic prevention and control lower than the epidemic data released on February 7, which
knowledge, timely release of epidemic transmission and obviously underestimates the infectiousness of the new
infection data, etc., can enable the public to understand the coronavirus. Professor Xiao from Xi’an Jiaotong University
epidemic in a timely manner, take reasonable response established an infectious disease dynamics model based on
measures, and avoid panic. The occurrence of bad domestic and foreign research on the transmission mechanism
consequences caused by spread. The collection, sorting, of the new coronavirus, based on strict tracking and isolation
analysis, and visualization of data can be completed well with measures. The risk of transmission of the new coronavirus
the help of big data technology. This topic is based on the pneumonia was predicted and analyzed, and the mission will
Python-based data analysis and visualization program of the reach the peak of the epidemic in February. However, the

IJISRT21NOV276 www.ijisrt.com 504


Volume 6, Issue 11, November – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
current epidemic situation has exceeded the predicted result. In II. SYSTEM DESIGN
February of this year, some researchers tried to use models
such as the SEIR model and c-SEIR model to infer the A. System Overall Architecture Design
"turning point" of the epidemic, but in these models, they did For the convenience of user operation, the system uses
not fully consider factors such as control measures and B/S architecture as the basic system architecture. Due to the
intensive treatment that cannot be ignored in practice [2]. need to consider data security and system stability, the
application is deployed separately during deployment.
C. Introduction To Key Technologies
 Introduction to Data Visualization Technology As shown in Figure 1, the user accesses the browser, and
 Pandas: Pandas is a Python data analysis software sends a network request to the back end of the system, and
package developed by AQR Capital Management in April then the back end receives this request and visits the view
2008 and released as an open source at the end of 2009. layer. The view layer obtains the data in the database by
Pandas was originally developed as a financial data accessing the database model interface, and then uploads and
analysis tool, so it provides good support for time series returns layer by layer. The view layer places the obtained
analysis. Pandas is a powerful tool set for analyzing return data into variables in the template, and finally displays
structured data. It is based on NumPy, which is used for the page to the user.
data mining and data analysis. It also provides data
The data analysis and visualization system is divided into
cleaning functions.
three parts: data acquisition module, data analysis module and
 ECharts and Pyecharts: ECharts is the abbreviation of
analysis result display module. The platform uses Python 3 for
Enterprise Charts. Enterprise-level data charts are a pure
development, the Web development framework uses Flask,
Javascript chart library that can run smoothly on PC and
and the database uses MySQL. The analysis of structured data
mobile devices. It is compatible with most current
uses Pandas, the word segmentation processing of
browsers and provides intuitive, vivid, interactive and
unstructured data uses Jieba, and the visualization uses
highly interactive. Personalized customized data
Echarts. The integrated development environment uses Jupter
visualization chart. Innovative drag-and-drop calculations,
Notebook and PyCharm.
data views, range roaming and other features greatly
enhance the user experience and enable users to mine and
integrate data. Pyecharts is a class library for generating
Echarts charts. Pyecharts is for docking with Python to
facilitate the direct use of data to generate graphs in
Python.

c) Infectious Disease Prediction Model


 SEIR model: The classic SEIR model divides the
population into susceptible (S), infected (I), lurking (exposed,
E), and recovered (R). The model also assumes that all
individuals in the population have the probability of being
infected. When the infected individuals recover, they will
produce antibodies, that is, the recovered population R will not
be infected again.
 Improved SEIR model: Due to the isolation measures
for the prevention and control of infectious diseases, we can
group the population in the model to add the susceptible
person Sq, the latent person Eq and the infected person Iq.
Because quarantined and infected people will be sent to
designated hospitals for quarantine treatment, this part of the
population will be transformed into hospitalized patients H in
this model [3]. Therefore, in the improved model, S, I, and E
respectively refer to the susceptible, infected, and latent people Fig. 1 : System Architecture Model
missed under the isolation measures. Isolated susceptible
persons can be transformed into susceptible persons again after B. System Module Design
being released from quarantine, while infected persons and According to the system requirements analysis content
latent persons have different degrees of ability to infect and system architecture design, the system can be divided into
susceptible persons, which can transform them into latent three modules: data acquisition, data analysis and visualization
persons. system. The system module structure diagram is shown in
Figure 2.
d) Data Sources
This project uses web crawlers, and the data comes from  Data collection module: Use python crawler technology to
the real-time epidemic website of Dingxiangyuan, the real- crawl data from epidemic data websites, use selenium web
time epidemic website of Tencent and the hot search website automation tools to simulate the operation of Chrome
of Baidu epidemic. browser web pages, and use requests to obtain web page
information. Either manually run the crawler file to obtain

IJISRT21NOV276 www.ijisrt.com 505


Volume 6, Issue 11, November – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
the specified data, or deploy it to the server environment to but = brower.find_element_by_css_selector('#ptab-0 >
run automatically. div > div.VirusHot_1-5-6_32AY4F.VirusHot_1-5-6_2RnRvg
System architecture model. > section > div')
 Data analysis module: use Pandas for data reading and
grouping aggregation calculations, use Pandas+Seaborn for but.click()
visual display and simple time series analysis, use Folium
for geographic-based data visualization, use Seaborn to time.sleep(1)
draw heat maps to display hotspots, and use SEIR Wait for
c = brower.find_elements_by_xpath('//*[@id="ptab-
the epidemic model to predict the epidemic situation, and
0"]/div/div[1]/section/a/div/span[2]')
finally form a visual chart, such as a geographic map, a
trend chart, and an epidemic forecast simulation map [4].
context = [i.text for i in c]

print(context)

return context

B. The Realization Of The Data Analysis Module


This module realizes the distribution of domestic and
foreign epidemic data, analyzes the epidemic distribution,
spread trend, development trend and epidemic forecast
simulation. The module interface is shown in the following
figure.

The template is designed so that author affiliations are


not repeated each time for multiple authors of the same
affiliation. Please keep your affiliations as succinct as possible
(for example, do not differentiate among departments of the
same organization). This template was designed for two
affiliations.
 Visualization system module: A large data visualization
a) Comparison of epidemics at home and abroad
screen based on the Flask framework and Echarts
A comparison of domestic and foreign advanced studies
technology provides users with an intuitive understanding
from the four aspects of current diagnoses, cumulative
of domestic epidemic information, displaying key
diagnoses, cured numbers and deaths. Using a pie chart, you
epidemic data, national epidemic maps, epidemic trend
can show the proportion of each part to the whole.
graphs, non-Hubei city confirmed rankings and epidemic
hot searches Word cloud illustration. The core code is as follows:
III. SYSTEM IMPLEMENTATION def new_label_opts():
A. The Realization Of The Data Acquisition Module return opts.LabelOpts(formatter=JsCode(fn),
Use the selenium web automation tool to simulate the position="center")
operation of the Chrome browser web page, and use requests
to obtain web page information. Either manually run the pie = (Pie(init_opts=opts.InitOpts(theme='dark',
crawler file to obtain the specified data, or deploy it to the width='1000px'))
server environment to run automatically. After the module
runs, update detailed epidemic data, epidemic history data, or .add(
epidemic hot search data according to the obtained parameters. " Diagnosed on the same day",

The core code is as follows: [(x, y) for x, y in


oversea_data['currentConfirmedCount'].items()],
def get_baidu_hot():
center=["30%", "30%"],
option = ChromeOptions()
radius=[60, 90],
option.add_argument("--headless")
label_opts=new_label_opts(),
url='https://voice.baidu.com/act/virussearch/virussearch?f )
rom=osari_map&tab=0&infomore=1'
.add(
brower = Chrome(options = option)
" Cumulative diagnosis",
brower.get(url)
[(x, y) for x, y in oversea_data['confirmedCount'].items()],

IJISRT21NOV276 www.ijisrt.com 506


Volume 6, Issue 11, November – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
at the same time compare the whole country, Hubei and
center=["70%", "30%"], Wuhan together to reflect the relationship between them.
radius=[60, 90], The core code is as follows:
label_opts=new_label_opts(), for key_, value_ in data_type.items():
) line = (Line(init_opts=opts.InitOpts(theme='dark',
width='1000px'))
.set_global_opts(
.add_xaxis([day.strftime('%Y-%m-%d') for day in
title_opts=opts.TitleOpts(title=" Comparison of epidemic time_range])
data at home and abroad ",
.add_yaxis("The entire country", area_data('The entire
subtitle=" Update time:{}".format(update_date)), country', value_), is_smooth=True,

legend_opts=opts.LegendOpts(is_show=True),) areastyle_opts=opts.AreaStyleOpts(opacity=0.5,
.set_series_opts( .add_yaxis("Hubei", area_data('Hubei', value_),
is_smooth=True,
tooltip_opts=opts.TooltipOpts(
areastyle_opts=opts.AreaStyleOpts(opacity=0))
trigger="item", formatter="{a} <br/>{b}: {c} ({d}%)"
.add_yaxis("Wuhan", area_data('Wuhan', value_),
))
is_smooth=True,
b) National Epidemic Map
The national epidemic map can clearly reflect the areastyle_opts=opts.AreaStyleOpts(opacity=0))
distribution of the epidemic. The more cumulatively
confirmed cases, the heavier the regional color. .set_series_opts(label_opts=opts.LabelOpts(is_show=Fal
se))
The core code is as follows:
_map = ( .set_global_opts(

Map(init_opts=opts.InitOpts(theme='dark',width='1000p title_opts=opts.TitleOpts(title="The trend graph{}of the


x')) entire country ".format(key_),

add("Cumulative confirmed number", cofirm, "china", subtitle="Update time:{}".format(update_date)),


is_map_symbol_show=False, is_roam=False)
.set_series_opts(label_opts=opts.LabelOpts(is_show=Tru ))
e))
d) National epidemic heat map
.set_global_opts( The national epidemic heat map can show the severity of
the epidemic at a certain time node in various regions of the
title_opts=opts.TitleOpts(title=" National Epidemic Map country over time, reflecting the spread and development of
of the Novel Coronavirus ", the epidemic.

subtitle="Update time:{}".format(update_date)), The core code is as follows:

legend_opts=opts.LegendOpts(is_show=False), for day in time_range:

visualmap_opts=opts.VisualMapOpts(is_show=True, geo = (
max_=1000,
Geo(init_opts=opts.InitOpts(theme='dark'))
is_piecewise=False,
.add_schema(maptype="china", zoom=1)
range_color=['#FFFFE0', '#FFA07A', '#CD5C5C',
.add("Current confirmed number",
'#8B0000'])
[(key_, value_['currentConfirmedCount']) for key_,
)
value_, in format_data[day].items()
)
if key_ in pyecharts.datasets.COORDINATES.keys() and
c) Trend map of cumulative diagnoses nationwide
value_['is_city'] == 1],
The nationwide cumulative diagnosis trend map can
intuitively reflect the development trend of the epidemic, and
type_='heatmap',

IJISRT21NOV276 www.ijisrt.com 507


Volume 6, Issue 11, November – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
symbol_size=3, The improved SEIR model mainly analyzes the Hubei
progressive=50) region, uses the public data of Hubei Province as the
parameter basis, considers the infectivity of lurkers and
.set_series_opts(label_opts=opts.LabelOpts(is_show=Fal various isolation prevention and control measures, and
se)) analyzes the impact of each factor on the development of the
epidemic.
.set_global_opts(
The core code is as follows:
title_opts=opts.TitleOpts(title=" New Coronavirus
National Epidemic Heat Map", [S1, S2] = deal(59170000);
subtitle="Update time:{}".format(update_date)), [E1, E2] = deal(4007);
legend_opts=opts.LegendOpts(is_show=False), [I1, I2] = deal(786);
range_color=['blue', 'green', 'green', 'yellow', 'red']),
[Sq1, Sq2] = deal(2776);
e) SEIR model prediction simulation
The SEIR model predicts the simulation map based on [Eq1, Eq2] = deal(400);
the SARS epidemic, setting γ=0.0821, λ=0.2586, the initial
susceptible number is 10 million, the initial infection is 10, the [H1, H2] = deal(1186);
initial migrant is 5, and the total number of people in the city
[R1, R2] = deal(31);
is N=1 e 7+10+5, bring it into the model to get the result.
T=1:150;
The core code is as follows:
for idx =1:length(T)-1
# initial infective people
S1(idx+1)=S1(idx)-(rho*c*beta+rho*c*q*(1-
i[0] = 10.0 / N
beta))*S1(idx)*(I1(idx)+theta1*E1(idx))+lambda*Sq1(idx);
s[0] = 1e7 / N
E1(idx+1)=E1(idx)+rho*c*beta*(1-
e[0] = 40.0 / N q)*S1(idx)*(I1(idx)+theta1*E1(idx))-sigma*E1(idx);

for t in range(T-1): I1(idx+1)=I1(idx)+sigma*E1(idx)-


(deltaI+alpha+gammaI)*I1(idx);
s[t + 1] = s[t] - lamda * s[t] * i[t]
Sq1(idx+1)=Sq1(idx)+rho*c*q*(1-
e[t + 1] = e[t] + lamda * s[t] * i[t] - sigma * e[t] beta)*S1(idx)*(I1(idx)+theta1*E1(idx))-lambda*Sq1(idx);

i[t + 1] = i[t] + sigma * e[t] - gamma * i[t] Eq1(idx+1)=Eq1(idx)+rho*c*beta*q*S1(idx)*(I1(idx)+th


eta1*E1(idx))-deltaq*Eq1(idx);
r[t + 1] = r[t] + gamma * i[t]
H1(idx+1)=H1(idx)+deltaI*I1(idx)+deltaq+Eq1(idx)-
fig, ax = plt.subplots(figsize=(10,6))
(alpha+gammaH)*H1(idx);
ax.plot(s, c='b', lw=2, label='S')
R1(idx+1)=R1(idx)+gammaI*I1(idx)+gammaH*H1(idx);
ax.plot(e, c='orange', lw=2, label='E') End

ax.plot(i, c='r', lw=2, label='I') C. Implementation Of The Visualization System


This module realizes a visual display of the domestic
ax.plot(r, c='g', lw=2, label='R') epidemic situation, displaying key epidemic data, a national
epidemic map, an epidemic trend graph, a ranking of
ax.set_xlabel('Day',fontsize=20) confirmed cases in non-Hubei cities, and an epidemic word
cloud map.
ax.set_ylabel('Infective Ratio', fontsize=20)
The core code is as follows:
ax.grid(1)
plt.xticks(fontsize=20) app = Flask(__name__)
@app.route("/r2")
plt.yticks(fontsize=20)
def get_r2_data():
plt.legend();
data = utils.get_r2_data()
f) Improved SEIR model

IJISRT21NOV276 www.ijisrt.com 508


Volume 6, Issue 11, November – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
((' Police fight on the front line to fight the epidemic for 16 REFERENCES
days and sacrifice 1037364',), (' Sichuan sends two more
medical teams 1537382',) [1.]Chenghu Zhou ac†, Fenzhen Su a c e †, C T P A , et al.
COVID-19: Challenges to GIS with Big Data[J].
d = [] Geography and Sustainability, 2020, 1( 1):77-87.
[2.]Mengistie T T. COVID-19 Outbreak Data Analysis and
for i in data: Prediction Modeling Using Data Mining Technique[J].
International Journal of Computer (IJC), 2020, Volume
k = i[0].rstrip(string.digits) 38(No 1):pp 37-60.
[3.]Al-Rousan N,Al-Najjar H. Data Analysis of Coronavirus
v = i[0][len(k):] CoVID-19 Epidemic in South Korea Based on Recovered
and Death Cases[J]. Journal of Medical Virology, 2020.
ks = extract_tags(k) [4.]de León, Ugo Avila-Ponce, Pérez, ngel G. C, Avila-Vales
E . A data driven analysis and forecast of an SEIARD
for j in ks: epidemic model for COVID-19 in Mexico[J]. 2020.
[5.]Meiling Z , Kun W , Xiao L , et al. Epitranscriptome
if not j.isdigit(): analysis of COVID-19 prevention and control[J]. Chinese
Journal of Medical ence Research Management, 2020,
d.append({"name": j, "value": v}) 33(00):E002-E002.

return jsonify({"kws": d})

IV. CONCLUSION

This paper mainly discusses the analysis, design and


implementation of each module of the new crown epidemic
data analysis and visualization system. This topic conducted a
preliminary analysis of the needs of the population concerned
about the epidemic, and subdivided the modules. After that, a
detailed design was carried out, the functional goals of the
system were extracted, and the codes for each function were
written, and the following functions were realized, including
data collection, data analysis, and visual display. Finally, a
system test was carried out to make the system more scientific
and rigorous[5].
During the realization of this subject, the data analysis
and visualization system is divided into three parts: data
acquisition module, data analysis module and analysis result
display module. The platform uses Python 3 for development,
the Web development framework uses Flask, and the database
uses MySQL. The analysis of structured data uses Pandas, the
word segmentation processing of unstructured data uses Jieba,
and the visualization uses Echarts. The integrated development
environment uses Jupter Notebook and PyCharm.

Tests show that the system has completed most of the


functions refined in the system's demand analysis. Due to time
reasons, this system still has various deficiencies. For
example, some functions have conflicts due to later
modification, and the details of the interface have yet to be
dealt with, settings Some settings on the page involve a wide
range of areas and need to be modified slowly. I hope that
through future learning, I will spare time to continuously
improve it, so that the above problems can be solved more
perfectly.

V. ACKNOWLEDGMENT

Dongmei Zhang is the corresponding author.

IJISRT21NOV276 www.ijisrt.com 509

You might also like