0% found this document useful (0 votes)
71 views21 pages

Covid Data Report

This project report presents an analysis and visualization of COVID-19 data, focusing on scientific research activities and statistical insights derived from a dataset of over 40,000 records. The analysis employs Python for data scraping, preprocessing, and visualization using Plotly Express, aiming to enhance understanding of the pandemic's impact and management strategies. The report includes project requirements, system design, coding examples, and future scope, contributing to the field of data science and visual analytics.

Uploaded by

amrutabaheti2019
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views21 pages

Covid Data Report

This project report presents an analysis and visualization of COVID-19 data, focusing on scientific research activities and statistical insights derived from a dataset of over 40,000 records. The analysis employs Python for data scraping, preprocessing, and visualization using Plotly Express, aiming to enhance understanding of the pandemic's impact and management strategies. The report includes project requirements, system design, coding examples, and future scope, contributing to the field of data science and visual analytics.

Uploaded by

amrutabaheti2019
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Maharashtra Shikshan Samiti’s

Maharashtra Mahavidyalaya,
Nilanga
NAAC Re-accredited “B+” Grade College
Affilated to
Swami Ramanand Teerth Marathwada University-Nanded

A Project
Report on
Covid Data Analysis and visualization

Covid Data Analysis and visualization

Submitted for the award of degree of

Master of Science

By:
Baheti Amruta Ishwarprsad

In
Year: 2024 – 2025
Maharashtra Shikshan Samiti’s

Maharashtra Mahavidyalaya,
Nilanga

CERTIFICATE
This is to certify that the project entitled “Covid Data Analysis
and visualization” has been carried out by Baheti Amruta
Ishwarprasad under my guidance in partial fulfillment of the
degree i.e. Bachelor of Computer Science of SRTMU, Nanded
during the academic year 2023-2024

Guide Internal Examiner External Examiner


HOD Principal
Department of
Computer Science
ACKNOWLEDGEMENT

We would like to convey our gratitude to Dr. M. N. Kolpuke, Principal of


Maharashtra Mahavidyalaya, Nilanga who always promote us for Research
and development project.

We are grateful to Swami Ramanand Teerth Marathwada University,


Nanded for giving an opportunity to deliver project.

We would like to thank Project Guide Sudoku Game who guided us


through doing these project development process, provided with invaluable
advice, helped us in difficult periods and provided practical assistant for our
project. Their willingness to motivate us contributed tremendously to the success
of this project.

We would like to express our special thanks of gratitude to our Head of


Department Mr. Patil G. S. and Co-Ordinator of the Department of Computer
Science Mr. Kiwade D. S. who helped us a lot in finalizing this project.

Besides we would like to thank all staff members who helped us by giving
advice and providing equipment which we needed.

Last but not in least we would like to thank all who helped and motivated us.

With Sincere Thanks,


1. Baheti Amruta Ishwarprasad
Index

Sr. No. Topic Name Page No.


1 Abstract 1
2 Introduction: 2
2.1 Project Overview
2.2 Project Plan (Gantt chart)
3 Project Requirement: 4
3.1 Hardware Requirement
3.2 Software Requirement
3.3 Front End
4 System Design: 6
4.1 E-R Diagram
4.2 Data Flow Diagram
5 Designing: 8
6 Coding: 10
7 Future scope of project: 29
8 Conclusion 30
9 Bibliography: 31
9.1 Book(s)
9.2 Website(s)
1. Abstract:-

COVID-2019 has been recognized as a global threat, and several studies are being
conducted in order to contribute to the fight and prevention of this pandemic. This work
presents a scholarly production dataset focused on COVID-19, providing an overview of
scientific research activities, making it possible to identify countries, total test & death cases .
The dataset is composed of 40,212 records of articles’ metadata collected from Scopus,
PubMed, arXiv and bioRxiv databases from January 2019 to July 2020.

Those data were extracted by using the techniques of Python Web Scraping and
preprocessed with Pandas Data Wrangling. It is visualized using Plotly Express in Python. It is
used to create dozens of bar charts, line graphs, bubble charts, scatter plots. Envisioning
COVID-19 will primarily be using Plotly Express for this project. The analysis and
visualization enable people to understand complex scenarios and make predictions about the
future from the current situation.

This analysis summarizes the modeling, simulation, and analytics work around the
COVID-19 outbreak around the world from the perspective of data science and visual
analytics. It examines the impact of best practices and preventive measures in various sectors
and enables outbreaks to be managed with available health resources.

1
2. Introduction:-

2.1) Project overview:-

Coronaviruses are a family of viruses that can cause respiratory illness in humans.
They are called “corona” because of crown-like spikes on the surface of the virus. Severe acute
respiratory syndrome (SARS), Middle East respiratory syndrome (MERS) and the common
cold are examples of coronaviruses that cause illness in humans.
The new strain of coronavirus — SARS-CoV-2 — was first reported in Wuhan,
China in December 2019. It has since spread to every country around the world.
This analysis summarizes the modeling, simulation, and analytics work around the
COVID-19 outbreak around the world from the perspective of data science and visual
analytics. It examines the impact of best practices and preventive measures in various sectors
and enables outbreaks to be managed with available health resources.

This project will introduce learners to an array of skills as they strive to create a data
visualization dashboard focusing on COVID-19 data using Python. Data visualization is a
quintessential part of any data science project and offers us valuable insights for understanding
and translating data.

7
2.2) Project Plan:-
Firstly we will ex the analysis of COVID-19 data and visualize it utilizing
Plotly Express in Python. The focus will be on generating a variety of visual
representations, including bar charts, line graphs, bubble charts, and scatter plots. The
visualizations produced in this project will be of high quality.
The primary tool for visualizing COVID-19 data will be Plotly Express.
Through this analysis and visualization, individuals will gain insights into complex
scenarios and be able to make informed predictions regarding future developments
based on the current data.
This analysis encapsulates the modeling, simulation, and analytical efforts
related to the global COVID-19 pandemic from a data science and visual analytics
perspective. It assesses the effectiveness of best practices and preventive measures
across different sectors, facilitating the management of outbreaks with the health
resources available.
Gantt Chart:-
Sr.no Task Name 21-Sept 5-Oct 24-Oct 11-Nov 27-Nov

1 Requirement
Gathering

2 Planning

3 Designing

4 Coding

5 Testing and
Deployment

8
3. Project Requirement:-
3.1) Hardware Requirements:-

 Processor: i3 and above


 RAM: 4 GB and above
 Hard Disk: 100 GB or above
 Input device: keyboard, Mouse
 Output device : Monitor or LCD/LED

3.2) Software Requirement:-

 IDE: Jupyter notebook, Spyder, Anaconda


 Language : Python (version 3.8 and above)

9
3.3) IDE :-

IDE : Jupyter notebook

10
4. System Design:-

4.1) ER Diagram:-

6
4.2) Data Flow Diagram:-

START

Covid 19 data collected from various online resoures

Total Cases Death Cases

Date Preparation

Data Preprocessing

7
5. Designing:-
 Home Page:

Coding and output pages :

8
9
6. Coding-

# Data analysis and Manipulation


import plotly.graph_objs as go
import plotly.io as pio
import plotly.express as px
import pandas as pd
# Data Visualization
import matplotlib.pyplot as plt
# Importing Plotly
import plotly.offline as py
py.init_notebook_mode(connected=True)
# Initializing Plotly
pio.renderers.default = 'colab'
#pio.renderers.default = "png"
# Importing Dataset1
dataset1 = pd.read_csv("covid.csv")
dataset1.head() # returns first 5 rows
Country/Region Continent Population TotalCases NewCases TotalDeaths NewDeaths TotalRecovered NewRecovered
ActiveCases Serious,Critical Tot Cases/1M pop Deaths/1M pop TotalTests Tests/1M pop WHO Region iso_alpha
0 USA North America 3.311981e+08 5032179 NaN 162804.0 NaN 2576668.0 NaN 2292707.0 18296.0 15194.0 492.0
63139605.0 190640.0 Americas USA
1 Brazil South America 2.127107e+08 2917562 NaN 98644.0 NaN 2047660.0 NaN 771258.0 8318.0 13716.0 464.0
13206188.0 62085.0 Americas BRA
2 India Asia 1.381345e+09 2025409 NaN 41638.0 NaN 1377384.0 NaN 606387.0 8944.0 1466.0 30.0 22149351.0
16035.0 South-EastAsia IND
3 Russia Europe 1.459409e+08 871894 NaN 14606.0 NaN 676357.0 NaN 180931.0 2300.0 5974.0 100.0 29716907.0
203623.0 Europe RUS
4 South Africa Africa 5.938157e+07 538184 NaN 9604.0 NaN 387316.0 NaN 141264.0 539.0 9063.0 162.0 3149807.0
53044.0 Africa ZAF
# Returns tuple of shape (Rows, columns)
print(dataset1.shape)
# Returns size of dataframe
print(dataset1.size)
(209, 17)
3553
# Information about Dataset1
# return concise summary of dataframe
dataset1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 209 entries, 0 to 208
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Country/Region 209 non-null object
1 Continent 208 non-null object
2 Population 208 non-null float64
3 TotalCases 209 non-null int64
4 NewCases 4 non-null float64
5 TotalDeaths 188 non-null float64
6 NewDeaths 3 non-null float64
7 TotalRecovered 205 non-null float64
8 NewRecovered 3 non-null float64
9 ActiveCases 205 non-null float64
10 Serious,Critical 122 non-null float64
11 Tot Cases/1M pop 208 non-null float64
12 Deaths/1M pop 187 non-null float64
13 TotalTests 191 non-null float64
14 Tests/1M pop 191 non-null float64
15 WHO Region 184 non-null object
16 iso_alpha 209 non-null object
dtypes: float64(12), int64(1), object(4)
memory usage: 27.9+ KB
# Columns labels of a Dataset1
dataset1.columns
Index(['Country/Region', 'Continent', 'Population', 'TotalCases', 'NewCases',
'TotalDeaths', 'NewDeaths', 'TotalRecovered', 'NewRecovered',
'ActiveCases', 'Serious,Critical', 'Tot Cases/1M pop', 'Deaths/1M pop',
'TotalTests', 'Tests/1M pop', 'WHO Region', 'iso_alpha'],
dtype='object')
import plotly
import kaleido
print(plotly.__version__, kaleido.__version__)
5.24.1 0.2.1
#!pip install kaleido
# from matplotlib import pyplot as plt
fig = px.bar(dataset1.head(15), x = 'Country/Region',
y = 'TotalCases',color = 'TotalCases',
height = 500,hover_data = ['Country/Region', 'Continent'])
#plt.savefig('foo.png', bbox_inches='tight')
#fig.write_image("abc.png")
# Drop NewCases, NewDeaths, NewRecovered rows from dataset1
dataset1.drop(['NewCases', 'NewDeaths', 'NewRecovered'],
axis=1, inplace=True)
# Select random set of values from dataset1
dataset1.sample(5)
Country/Region Continent Population TotalCases TotalDeaths TotalRecovered ActiveCases Serious,Critical Tot
Cases/1M pop Deaths/1M pop TotalTests Tests/1M pop WHO Region iso_alpha
155 Lesotho Africa 2143943.0 742 23.0 175.0 544.0 NaN 346.0 11.0 8771.0 4091.0 Africa LSO
101 Maldives Asia 541448.0 4680 19.0 2725.0 1936.0 12.0 8643.0 35.0 85587.0 158071.0 South-EastAsia MDV
113 Eswatini Africa 1161348.0 2968 55.0 1476.0 1437.0 5.0 2556.0 47.0 20784.0 17896.0 Africa SWZ
26 Egypt Africa 102516525.0 95006 4951.0 48898.0 41157.0 41.0 927.0 48.0 135000.0 1317.0 EasternMediterranean
EGY
77 Madagascar Africa 27755708.0 12526 134.0 10148.0 2244.0 88.0 451.0 5.0 46301.0 1668.0 Africa MDG
px.bar(dataset1.head(15), x = 'Country/Region', y = 'TotalCases',
color = 'TotalDeaths', height = 500,
hover_data = ['Country/Region', 'Continent'])
px.bar(dataset1.head(15), x = 'Country/Region', y = 'TotalCases',
color = 'TotalDeaths', height = 500,
hover_data = ['Country/Region', 'Continent'])
px.bar(dataset1.head(15), x = 'Country/Region', y = 'TotalCases',
color = 'TotalTests', height = 500, hover_data = ['Country/Region', 'Continent'])
px.bar(dataset1.head(15), x = 'TotalTests', y = 'Country/Region',
color = 'TotalTests',orientation ='h', height = 500,
hover_data = ['Country/Region', 'Continent'])
px.bar(dataset1.head(15), x = 'TotalTests', y = 'Continent',
color = 'TotalTests',orientation ='h', height = 500,
hover_data = ['Country/Region', 'Continent'])
import plotly
plotly.io.kaleido.scope.mathjax = None
fig = px.scatter(dataset1, x='Continent',y='TotalCases',
hover_data=['Country/Region', 'Continent'],
color='TotalCases', size='TotalCases', size_max=80)
fig.write_html("scatter_plot.html")
px.scatter(dataset1.head(57), x='Continent',y='TotalCases',
hover_data=['Country/Region', 'Continent'],
color='TotalCases', size='TotalCases', size_max=80, log_y=True)
px.scatter(dataset1.head(54), x='Continent',y='TotalTests',
hover_data=['Country/Region', 'Continent'],
color='TotalTests', size='TotalTests', size_max=80)
px.scatter(dataset1.head(50), x='Continent',y='TotalTests',
hover_data=['Country/Region', 'Continent'],
color='TotalTests', size='TotalTests', size_max=80, log_y=True)
px.scatter(dataset1.head(100), x='Country/Region', y='TotalCases',
hover_data=['Country/Region', 'Continent'],
color='TotalCases', size='TotalCases', size_max=80)
px.scatter(dataset1.head(30), x='Country/Region', y='TotalCases',
hover_data=['Country/Region', 'Continent'],
color='Country/Region', size='TotalCases', size_max=80, log_y=True)
px.scatter(dataset1.head(10), x='Country/Region', y= 'TotalDeaths',
hover_data=['Country/Region', 'Continent'],
color='Country/Region', size= 'TotalDeaths', size_max=80)
px.scatter(dataset1.head(30), x='Country/Region', y= 'Tests/1M pop',
hover_data=['Country/Region', 'Continent'],
color='Country/Region', size= 'Tests/1M pop', size_max=80)
px.scatter(dataset1.head(30), x='Country/Region', y= 'Tests/1M pop',
hover_data=['Country/Region', 'Continent'],
color='Tests/1M pop', size= 'Tests/1M pop', size_max=80)
px.scatter(dataset1.head(30), x='TotalCases', y= 'TotalDeaths',
hover_data=['Country/Region', 'Continent'],
color='TotalDeaths', size= 'TotalDeaths', size_max=80)
px.scatter(dataset1.head(30), x='TotalCases', y= 'TotalDeaths',
hover_data=['Country/Region', 'Continent'],
color='TotalDeaths', size= 'TotalDeaths', size_max=80,
log_x=True, log_y=True)
px.scatter(dataset1.head(30), x='TotalTests', y= 'TotalCases',
hover_data=['Country/Region', 'Continent'],
color='TotalTests', size= 'TotalTests', size_max=80,
log_x=True, log_y=True)
from zipfile import ZipFile
import os
zipObj = ZipFile('yourzipfilename.zip', 'w')
for filename in os.listdir("C:/Users/Rushikesh/Documents"):
if filename.endswith(".png"):
zipObj.write(filename)
zipObj.close()
In [2]:
In [3]:
Out[3]:
In [4]:
In [5]:
In [6]:
Out[6]:
In [7]:
In [8]:
In [9]:
Out[9]:
In [ ]:
In [ ]:
In [11]:
In [12]:
In [13]:
In [10]:
In [15]:
In [16]:
In [17]:
In [18]:
In [19]:
In [20]:
In [21]:
In [22]:
In [23]:
In [24]:
In [25]:
In [13]:
In [ ]:

7. Future scope of project-

To further evaluate our tool, we applied it to COVID-19 epidemiological

data for Canada [36] captured by Public Health Agency of Canada (PHAC)

and Statistics Canada. The data contain administrative information, case

details, symptom-related information, clinical course and outcomes, as well

exposure methods, for all 107,916 captured cases from January 25 (when

the first case confirmed in Canada) to August 06, 2020.

Our tool conducts visual analytics on the data to discover frequent patterns

and visualizes the discovered knowledge by displaying interesting

information in the form of a pie chart for each frequent 1-itemset (and its
related information) and a sunburst diagram for each frequent k-itemset

(and its related information, for k > 1). As previewed in Example 1, our tool

discovers that 90% of cases were transmitted through domestic acquisition

(i.e., community exposures). See Fig. 9(a), which also shows that, among

the remaining 10% of cases, 4% were transmitted through international

travel (i.e., travel exposures) and 6% were unstated transmission (i.e.,

NULL).

To avoid distraction from NULL values, our tool provides users with

flexibility of visualizing non-NULL values. See Fig. 9(b), which focuses on

the 90%+4% = 94% of cases (i.e., those with stated/known values). As

previewed in Example 2, our tool reveals that 90/94 ≈ 96% of cases with

stated/known transmission methods were transmitted through domestic

acquisition, whereas the remaining 4/94 ≈ 4% were transmitted through

international travel.

8. Conclusion-

To further evaluate our tool, we applied it to COVID-19 epidemiological


data for Canada [36] captured by Public Health Agency of Canada (PHAC)
and Statistics Canada. The data contain administrative information, case
details, symptom-related information, clinical course and outcomes, as well
exposure methods, for all 107,916 captured cases from January 25 (when
the first case confirmed in Canada) to August 06, 2020.
Our tool conducts visual analytics on the data to discover frequent patterns
and visualizes the discovered knowledge by displaying interesting
information in the form of a pie chart for each frequent 1-itemset (and its
related information) and a sunburst diagram for each frequent k-itemset
(and its related information, for k > 1). As previewed in Example 1, our tool
discovers that 90% of cases were transmitted through domestic acquisition
(i.e., community exposures). See Fig. 9(a), which also shows that, among
the remaining 10% of cases, 4% were transmitted through international
travel (i.e., travel exposures) and 6% were unstated transmission (i.e.,
NULL).

To avoid distraction from NULL values, our tool provides users with
flexibility of visualizing non-NULL values. See Fig. 9(b), which focuses on
the 90%+4% = 94% of cases (i.e., those with stated/known values). As
previewed in Example 2, our tool reveals that 90/94 ≈ 96% of cases with
stated/known transmission methods were transmitted through domestic
acquisition, whereas the remaining 4/94 ≈ 4% were transmitted through
international travel.

9.Bibliography-

9.1) Books:
Published in: 2024 5th International Conference on Innovative Trends in
Information Technology (ICITIIT)

Epidemiological and clinical characteristics of 161 discharged cases with


coronavirus disease 2019 in Shanghai, China

The novel coronavirus, 2019-nCoV, is highly contagious and more infectious than
initially estimate

You might also like