0% found this document useful (0 votes)
15 views14 pages

My P Report

Uploaded by

Mayank Sajwan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views14 pages

My P Report

Uploaded by

Mayank Sajwan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ICFAI UNIVERSITY DEHRADUN

PROJECT REPORT:
COVID-19 IMPACTS ANALYSIS
Bsc. (Data Science)

Submitted by submitted to
Mayank Sajwan Dr. Nishant Mathur
22STUCDDD04007
PROJECT REPORT:

COVID-19 IMPACTS ANALYSIS (CASE STUDY)


The first wave of covid-19 impacted the global economy as the world was never ready for the
pandemic. It resulted in a rise in cases, a rise in deaths, a rise in unemployment and a rise in
poverty, resulting in an economic slowdown. Here, you are required to analyze the spread of
Covid-19 cases and all the impacts of covid-19 on the economy.

DATA SET:
The dataset we are using to analyze the impacts of covid-19 is downloaded from Kaggle. It
contains data about:

• The country code


• name of all the countries
• date of the record
• Human development index of all the countries
• Daily covid-19 cases
• Daily deaths due to covid-19
• stringency index of the countries
• the population of the countries
• GDP per capita of the countries

IMPORTING THE NECESSARY PYTHON LIBRARIES AND THE


DATASET:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

data = pd.read_csv("transformed_data.csv")
data2 = pd.read_csv("/raw_data.csv")
print(data)
CODE COUNTRY DATE HDI TC TD ST
I \
0 AFG Afghanistan 2019-12-31 0.498 0.000000 0.000000 0.00000
0
1 AFG Afghanistan 2020-01-01 0.498 0.000000 0.000000 0.00000
0
2 AFG Afghanistan 2020-01-02 0.498 0.000000 0.000000 0.00000
0
3 AFG Afghanistan 2020-01-03 0.498 0.000000 0.000000 0.00000
0
4 AFG Afghanistan 2020-01-04 0.498 0.000000 0.000000 0.00000
0
... ... ... ... ... ... ... ..
.
50413 ZWE Zimbabwe 2020-10-15 0.535 8.994048 5.442418 4.34185
5
50414 ZWE Zimbabwe 2020-10-16 0.535 8.996528 5.442418 4.34185
5
50415 ZWE Zimbabwe 2020-10-17 0.535 8.999496 5.442418 4.34185
5
50416 ZWE Zimbabwe 2020-10-18 0.535 9.000853 5.442418 4.34185
5
50417 ZWE Zimbabwe 2020-10-19 0.535 9.005405 5.442418 4.34185
5

POP GDPCAP
0 17.477233 7.497754
1 17.477233 7.497754
2 17.477233 7.497754
3 17.477233 7.497754
4 17.477233 7.497754
... ... ...
50413 16.514381 7.549491
50414 16.514381 7.549491
50415 16.514381 7.549491
50416 16.514381 7.549491
50417 16.514381 7.549491

[50418 rows x 9 columns]


The data we are using contains the data on covid-19 cases and their impact on GDP from
December 31, 2019, to October 10, 2020.

DATA PREPARATION:
The dataset that we are using here contains two data files. One file contains raw data, and the
other file contains transformed one. But we have to use both datasets for this task, as both of
them contain equally important information in different columns. So let’s have a look at both the
datasets one by one:

print(data.head()) // first five row of dataset 1(raw_data)

CODE COUNTRY DATE HDI TC TD STI POP GDP


CAP
0 AFG Afghanistan 2019-12-31 0.498 0.0 0.0 0.0 17.477233 7.497
754
1 AFG Afghanistan 2020-01-01 0.498 0.0 0.0 0.0 17.477233 7.497
754
2 AFG Afghanistan 2020-01-02 0.498 0.0 0.0 0.0 17.477233 7.497
754
3 AFG Afghanistan 2020-01-03 0.498 0.0 0.0 0.0 17.477233 7.497
754
4 AFG Afghanistan 2020-01-04 0.498 0.0 0.0 0.0 17.477233 7.497
754

print(data2.head()) // first five row of Dataset 2(transformed_data)

iso_code location date total_cases total_deaths \


0 AFG Afghanistan 2019-12-31 0.0 0.0
1 AFG Afghanistan 2020-01-01 0.0 0.0
2 AFG Afghanistan 2020-01-02 0.0 0.0
3 AFG Afghanistan 2020-01-03 0.0 0.0
4 AFG Afghanistan 2020-01-04 0.0 0.0

stringency_index population gdp_per_capita human_development_inde


x \
0 0.0 38928341 1803.987 0.49
8
1 0.0 38928341 1803.987 0.49
8
2 0.0 38928341 1803.987 0.49
8
3 0.0 38928341 1803.987 0.49
8
4 0.0 38928341 1803.987 0.49
8

Unnamed: 9 Unnamed: 10 Unnamed: 11 Unnamed: 12 Unnamed: 13


0 #NUM! #NUM! #NUM! 17.477233 7.497754494
1 #NUM! #NUM! #NUM! 17.477233 7.497754494
2 #NUM! #NUM! #NUM! 17.477233 7.497754494
3 #NUM! #NUM! #NUM! 17.477233 7.497754494
4 #NUM! #NUM! #NUM! 17.477233 7.497754494

After having initial impressions of both datasets, I found that we have to combine both datasets
by creating a new dataset. But before we create a new dataset, let’s have a look at how many
samples of each country are present in the dataset:

data["COUNTRY"].value_counts()

COUNTRY
Afghanistan 294
Indonesia 294
Macedonia 294
Luxembourg 294
Lithuania 294
...
Tajikistan 172
Comoros 171
Lesotho 158
Hong Kong 51
Solomon Islands 4
Name: count, Length: 210, dtype: int64

So we don’t have an equal number of samples of each country in the dataset. Let’s have a look
at the mode value:

data["COUNTRY"].value_counts().mode()

0 294
Name: count, dtype: int64

So 294 is the mode value. We will need to use it for dividing the sum of all the samples related to
the human development index, GDP per capita, and the population. Now let’s create a new
dataset by combining the necessary columns from both the datasets:

# AGGREGATING THE DATA


code = data["CODE"].unique().tolist()
country = data["COUNTRY"].unique().tolist()
hdi = []
tc = []
td = []
sti = []
population = data["POP"].unique().tolist()
gdp = []

for i in country:
hdi.append((data.loc[data["COUNTRY"] == i, "HDI"]).sum()/294)
tc.append((data2.loc[data2["location"] == i, "total_cases"]).sum())
td.append((data2.loc[data2["location"] == i, "total_deaths"]).sum())
sti.append((data.loc[data["COUNTRY"] == i, "STI"]).sum()/294)
population.append((data2.loc[data2["location"] == i, "population"]).sum()/294)

aggregated_data = pd.DataFrame(list(zip(code, country, hdi, tc, td, sti, population)),


columns = ["Country Code", "Country", "HDI",
"Total Cases", "Total Deaths",
"Stringency Index", "Population"])
print(aggregated_data.head())

NEW DATASET(AGGREGATED_DATA) :
Country Code Country HDI Total Cases Total Deaths \
0 AFG Afghanistan 0.498000 5126433.0 165875.0
1 ALB Albania 0.600765 1071951.0 31056.0
2 DZA Algeria 0.754000 4893999.0 206429.0
3 AND Andorra 0.659551 223576.0 9850.0
4 AGO Angola 0.418952 304005.0 11820.0

Stringency Index Population


0 3.049673 17.477233
1 3.005624 14.872537
2 3.195168 17.596309
3 2.677654 11.254996
4 2.965560 17.307957

I have not included the GDP per capita column yet. I didn’t find the correct figures for GDP per
capita in the dataset. So it will be better to manually collect the data about the GDP per capita of
the countries.
As we have so many countries in this data, it will not be easy to manually collect the data about
the GDP per capita of all the countries. So let’s select a subsample from this dataset. To create a
subsample from this dataset, I will be selecting the top 10 countries with the highest number of
covid-19 cases. It will be a perfect sample to study the economic impacts of covid-19. So let’s
sort the data according to the total cases of Covid-19:

# SORTING DATA ACCORDING TO TOTAL CASES:


data = aggregated_data.sort_values(by=["Total Cases"], ascending=False)
print(data.head())
Country Code Country HDI Total Cases Total Deaths \
200 USA United States 0.92400 746014098.0 26477574.0
27 BRA Brazil 0.75900 425704517.0 14340567.0
90 IND India 0.64000 407771615.0 7247327.0
157 RUS Russia 0.81600 132888951.0 2131571.0
150 PER Peru 0.59949 74882695.0 3020038.0

Stringency Index Population


200 3.350949 19.617637
27 3.136028 19.174732
90 3.610552 21.045353
157 3.380088 18.798668
150 3.430126 17.311165

Now here’s how we can select the top 10 countries with the highest number of cases:

# TOP 10 COUNTRIES WITH HIGHEST COVID CASES


data = data.head(10)
print(data)
Country Code Country HDI Total Cases Total Deaths
\
200 USA United States 0.924000 746014098.0 26477574.0

27 BRA Brazil 0.759000 425704517.0 14340567.0

90 IND India 0.640000 407771615.0 7247327.0

157 RUS Russia 0.816000 132888951.0 2131571.0

150 PER Peru 0.599490 74882695.0 3020038.0

125 MEX Mexico 0.774000 74347548.0 7295850.0

178 ESP Spain 0.887969 73717676.0 5510624.0

175 ZAF South Africa 0.608653 63027659.0 1357682.0

42 COL Colombia 0.581847 60543682.0 1936134.0

199 GBR United Kingdom 0.922000 59475032.0 7249573.0

Stringency Index Population


200 3.350949 19.617637
27 3.136028 19.174732
90 3.610552 21.045353
157 3.380088 18.798668
150 3.430126 17.311165
125 3.019289 18.674802
178 3.393922 17.660427
175 3.364333 17.898266
42 3.357923 17.745037
199 3.353883 18.033340

Now I will add two more columns (GDP per capita before Covid-19, GDP per capita during
Covid-19) to this dataset:

data["GDP Before Covid"] = [65279.53, 8897.49, 2100.75,


11497.65, 7027.61, 9946.03,
29564.74, 6001.40, 6424.98, 42354.41]
data["GDP During Covid"] = [63543.58, 6796.84, 1900.71,
10126.72, 6126.87, 8346.70,
27057.16, 5090.72, 5332.77, 40284.64]
print(data)
Country Code Country HDI Total Cases Total Deaths
\
200 USA United States 0.924000 746014098.0 26477574.0

27 BRA Brazil 0.759000 425704517.0 14340567.0


90 IND India 0.640000 407771615.0 7247327.0

157 RUS Russia 0.816000 132888951.0 2131571.0

150 PER Peru 0.599490 74882695.0 3020038.0

125 MEX Mexico 0.774000 74347548.0 7295850.0

178 ESP Spain 0.887969 73717676.0 5510624.0

175 ZAF South Africa 0.608653 63027659.0 1357682.0

42 COL Colombia 0.581847 60543682.0 1936134.0

199 GBR United Kingdom 0.922000 59475032.0 7249573.0

Stringency Index Population GDP Before Covid GDP During Covid


200 3.350949 19.617637 65279.53 63543.58
27 3.136028 19.174732 8897.49 6796.84
90 3.610552 21.045353 2100.75 1900.71
157 3.380088 18.798668 11497.65 10126.72
150 3.430126 17.311165 7027.61 6126.87
125 3.019289 18.674802 9946.03 8346.70
178 3.393922 17.660427 29564.74 27057.16
175 3.364333 17.898266 6001.40 5090.72
42 3.357923 17.745037 6424.98 5332.77
199 3.353883 18.033340 42354.41 40284.64

NOTE: THE DATA ABOUT THE GDP PER CAPITA IS COLLECTED MANUALLY.
Analyzing the Spread of Covid-19
Now let’s start by analyzing the spread of covid-19 in all the countries with the highest number of
covid-19 cases. I will first have a look at all the countries with the highest number of covid-19
cases:

figure = px.bar(data, y='Total Cases', x='Country',


title="Countries with Highest Covid Cases")
figure.show()
We can see that the USA is comparatively having a very high number of covid-19 cases as
compared to Brazil and India in the second and third positions. Now let’s have a look at the total
number of deaths among the countries with the highest number of covid-19 cases:

figure = px.bar(data, y='Total Deaths', x='Country',


title="Countries with Highest Deaths")
figure.show()

Just like the total number of covid-19 cases, the USA is leading in the deaths, with Brazil and
India in the second and third positions. One thing to notice here is that the death rate in India,
Russia, and South Africa is comparatively low according to the total number of cases. Now let’s
compare the total number of cases and total deaths in all these countries:

fig = go.Figure()
fig.add_trace(go.Bar(
x=data["Country"],
y=data["Total Cases"],
name='Total Cases',
marker_color='indianred'
))
fig.add_trace(go.Bar(
x=data["Country"],
y=data["Total Deaths"],
name='Total Deaths',
marker_color='lightsalmon'
))
fig.update_layout(barmode='group', xaxis_tickangle=-45)
fig.show()

Now let’s have a look at the percentage of total deaths and total cases among all the countries
with the highest number of covid-19 cases:

# PERCENTAGE OF TOTAL CASES AND DEATHS


cases = data["Total Cases"].sum()
deceased = data["Total Deaths"].sum()

labels = ["Total Cases", "Total Deaths"]


values = [cases, deceased]

fig = px.pie(data, values=values, names=labels,


title='Percentage of Total Cases and Deaths', hole=0.5)
fig.show()

death_rate = (data["Total Deaths"].sum() / data["Total Cases"].sum()) * 100


print("Death Rate = ", death_rate)
Death Rate = 3.6144212045653767

Another important column in this dataset is the stringency index. It is a composite measure of
response indicators, including school closures, workplace closures, and travel bans. It shows
how strictly countries are following these measures to control the spread of covid-19:

fig = px.bar(data, x='Country', y='Total Cases',


hover_data=['Population', 'Total Deaths'],
color='Stringency Index', height=400,
title= "Stringency Index during Covid-19")
fig.show()

Here we can see that India is performing well in the stringency index during the outbreak of
covid-19.

ANALYZING COVID-19 IMPACTS ON ECONOMY ::


Now let’s move to analyze the impacts of covid-19 on the economy. Here the GDP per capita is
the primary factor for analyzing the economic slowdowns caused due to the outbreak of covid-19.
Let’s have a look at the GDP per capita before the outbreak of covid-19 among the countries with
the highest number of covid-19 cases:

fig = px.bar(data, x='Country', y='Total Cases',


hover_data=['Population', 'Total Deaths'],
color='GDP Before Covid', height=400,
title="GDP Per Capita Before Covid-19")
fig.show()
Now let’s have a look at the GDP per capita during the rise in the cases of covid-19:

fig = px.bar(data, x='Country', y='Total Cases',


hover_data=['Population', 'Total Deaths'],
color='GDP During Covid', height=400,
title="GDP Per Capita During Covid-19")
fig.show()

Now let’s compare the GDP per capita before covid-19 and during covid-19 to have a look at the
impact of covid-19 on the GDP per capita:

fig = go.Figure()
fig.add_trace(go.Bar(
x=data["Country"],
y=data["GDP Before Covid"],
name='GDP Per Capita Before Covid-19',
marker_color='indianred'
))
fig.add_trace(go.Bar(
x=data["Country"],
y=data["GDP During Covid"],
name='GDP Per Capita During Covid-19',
marker_color='lightsalmon'
))
fig.update_layout(barmode='group', xaxis_tickangle=-45)
fig.show()

You can see a drop in GDP per capita in all the countries with the highest number of covid-19
cases.

One other important economic factor is Human Development Index. It is a statistic composite
index of life expectancy, education, and per capita indicators. Let’s have a look at how many
countries were spending their budget on the human development:

In [19]:
fig = px.bar(data, x='Country', y='Total Cases',
hover_data=['Population', 'Total Deaths'],
color='HDI', height=400,
title="Human Development Index during Covid-19")
fig.show()
So this is how we can analyze the spread of Covid-19 and its impact on the economy.

CONCLUSION
In this task, we conclude that the spread of covid-19 among the countries and its impact on the
global economy. We saw that the outbreak of covid-19 resulted in the highest number of covid-19
cases and deaths in the united states. One major reason behind this is the stringency index of
the United States. It is comparatively low according to the population. We also analyzed how the
GDP per capita of every country was affected during the outbreak of covid-19.

You might also like