0% found this document useful (0 votes)

18 views40 pages

Python Project 1

Uploaded by

phucn5203

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views40 pages

Python Project 1

Uploaded by

phucn5203

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Python Project 1

Life Expectancy

Nguyễ n Phương Vũ Nguyên

1.0

1.0
Trầ n Nguyễ n Quỳnh Anh

1.0
Nguyễ n Trầ n Hoàng Phúc

Nguyễ n Vương Minh 1.0

Đỗ Lê Huy
1.0

Hoàng Xuân Phước 1.0

12 BFA & BBA

02/06/2024
Group Leader
Nguyễ n Phương Vũ Nguyên Plot 3 - 4
Check content

Plot 5
Trầ n Nguyễ n Quỳnh Anh Design Report
Write description

Plot 1
Design Report
Nguyễ n Trầ n Hoàng Phúc
Check content

Plot 6
Write description
Nguyễ n Vương Minh
Check grammar
Re-check plot 1-3
Plot 7
Đỗ Lê Huy Write description
Re-check plot 9-10

Plot 8 - 10
Hoàng Xuân Phước Check grammar + content
Re-check plot 6 - 8
Python Project 1
Vietnamese - German University

LIFE EXPECTANCY
&
SOCIO-ECONOMIC
WORLD BANK

Instructor : Dr. Do Duc Tan

Thursday Morning Class
Group 12
LIFE EXPECTANCY
& SOCIO-ECONOMIC

DATASET
BASIC
INFORMATION
- SHRITEJ SHRIKANT CHAVAN -

Introduction
With 16 columns and 3307 rows of data, it provides a multifaceted
view of factors influencing human health and well-being. The "Life
Expectancy & Socio-Economic" dataset provides information on
various socio-economic factors and their impact on life expectancy
across different countries and regions.

Link
Life expectancy & Socio-Economic (world bank) dataset link:
https://www.kaggle.com/datasets/mjshri23/life-expectancy-and-
socio-economic-world-bank/data
The reason we choose this dataset
The "Life Expectancy & Socio-Economic" dataset offers a rich and
comprehensive exploration of the interplay between various socio-
economic indicators and life expectancy across different countries
and regions. This dataset is particularly intriguing due to its breadth,
covering aspects such as income groups, health expenditure,
education expenditure, unemployment rates, and prevalence of
undernourishment, among others.

We aim to explore the correlation between income levels and quality

of life from this dataset. Additionally, we seek to establish
connections between income levels and the prevalence of diseases,
shedding light on how socio-economic factors influence health
outcomes. Our approach involves comparing regions and countries
with diverse socio-economic profiles to understand how variations in
income impact quality of life and health outcomes, as well as to
assess the effectiveness of health and education expenditure in
reducing diseases and unemployment.

Description of Variables related to this dataset

1.Country: 174 countries
2. Country Code: 3-letter code
3. Region: region of the world SWOT Analysis in 3D
4. Income Group: country’s income class
5. Year: 2000-2019 (both included)
6. Life expectancy: data
7. Prevalence of Undernourishment (% of the population):
Prevalence of undernourishment is the percentage of the
population whose habitual food consumption is insufficient
to provide the dietary energy levels that are required to
maintain a normally active and healthy life
8. Carbon dioxide emissions (kiloton): Carbon dioxide
emissions are those stemming from the burning of fossil
fuels and the manufacture of cement. They include carbon
dioxide produced during the consumption of solid, liquid,
and gas fuels and gas flaring
9. Health Expenditure (% of GDP):
Level of current health expenditure expressed as a
percentage of GDP. Estimates of current health expenditures
include healthcare goods and services consumed during
each year. This indicator does not include capital health
expenditures such as buildings, machinery, IT, and stocks of
vaccines for emergencies or outbreaks.

10. Education Expenditure (% of GDP):

General government expenditure on education (current,
capital, and transfers) is expressed as a percentage of GDP. It
includes expenditures funded by transfers from international
sources to the government. General government usually
refers to local, regional, and central governments.

11. Unemployment (% total labor force):

Unemployment refers to the % share of the labor force that is
without work but available for and seeking employment

12.Corruption (CPIA rating): SWOT Analysis in 3D

Transparency, accountability, and corruption in the public
sector assets the extent to which the executive can be held
accountable for its use of funds and for the results of its
actions by the electorate and by the legislature and judiciary,
and the extent to which public employees within the
executive are required to account for administrative
decisions, use of resources, and results obtained.
13.Sanitation - People using safely managed sanitation
services (% of the population): The percentage of people
using improved sanitation facilities that are not shared with
other households and where excrete are safely disposed of
in site or transported and treated offsite. Improved
sanitation facilities include flush/pour flush to piped sewer
systems, septic tanks, or pit latrines: ventilated improved pit
latrines, compositing toilets, or pit latrines with slabs.
WHO/UNICEF Joint Monitoring Programme (JMP) for Water
Supply, Sanitation and Hygiene (washdata.org).

14.Disability-Adjusted Life Years (DALYs): due to Injuries -

One DALY represents. The loss of the equivalent of one year
of full health. DALYs for an injury or health condition is the
sum of the years of life lost due to premature mortality
(YLLs) and the years lived with a disability (YLDs) due to
prevalent cases of the disease in a population

15.Disability-Adjusted Life Years (DALYs): due to

Communicable diseases - One DALY represents the loss of
the equivalent of one year of full health. DALYs for a
communicable disease or health condition is the sum of the
SWOT
years of life lost due to premature Analysis
mortality in 3D
(YLLs) and the
years lived with a disability (YLDs) due to prevalent cases of
the disease in a population.

16.Disability-Adjusted Life Years (DALYs): due to Non-

Communicable diseases - One DALY represents the loss of
the equivalent of one year of full health. DALYs for a non-
communicable disease or health condition is the sum of the
years of life lost due to premature mortality (YLLs) and the
years lived with a disability (YLDs) due to prevalent cases of
the disease in a population.
Table of
CONTENTS
01 Life Expectancy by Income Group

02 Average DALYs due to various factors

03 The percentage of Income Groups of

different Regions

04 Occurances of corruption of
different income groups by years

05 Health Expenditure expenditure of

countries in different regions
Table of
CONTENTS
06 Prevalence of Undernourishment
across various income groups

07 Average Education Expenditure and

Unemployment by Region

08 Average Co2 Emissions of different

regions from 2001 to 2019

09 Distribution of average Unemployment

of different years in different regions

10 Average Sanitation across different

regions and income groups
LIFE EXPECTANCY BY
INCOME GROUP
LIFE EXPECTANCY BY
INCOME GROUP

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Socio_Economic_and_Life_expectancy =
pd.read_csv('C:\\Users\\Phucn\\Documents\\Python\\1.csv')

Plot1 = Socio_Economic_and_Life_expectancy.dropna(subset=["IncomeGroup",
"Life Expectancy World Bank"])

Plot1 ['IncomeGroup'] = pd.Categorical(Plot1['IncomeGroup'],

categories=["Low income", "Lower middle income", "Upper
middle income", "High income"],
ordered=True)

plt.figure(figsize=(10, 6))
sns.violinplot(data=Plot1, x='IncomeGroup', y='Life Expectancy World Bank',
scale='width', inner='quartile', palette='PuBu')

plt.title('Life Expectancy by Income Groups', color = 'blue')

plt.ylabel('Life Expectancy')
plt.xlabel('')

plt.legend([],[], frameon=False)

sns.set_style("whitegrid")
plt.show()
LIFE EXPECTANCY BY
INCOME GROUP

The violin plot illustrates the distribution of life expectancy

across different income groups. Each violin represents a
specific income group, with the width indicating the density
of countries within that group. The shape of the violins
depicts the distribution of life expectancy values within
each income category.

From the plot, we can observe a clear trend of increasing

life expectancy as we move from lower-income to higher-
income groups. Countries in the high-income group exhibit
the highest life expectancy, with a concentration around 82
years of age. Conversely, countries in the low-income group
show a wider range of life expectancy, with values spanning
from 50 to 65 years. The lower-middle income group tends
to have a median life expectancy of around 70 years, while
the upper-middle income group exhibits a median of
around 73 years.

Overall, the plot highlights that countries in higher income

groups generally have better life expectancy than those
with lower income, as evidenced by the broader and taller
violins in higher-income categories.
Average DALYs due to various factors
Average DALYs due to various factors
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

Socio_Economic_and_Life_expectancy = pd.read_csv('C:\\Users\\HP\\Desktop\\VGU\\Python\\Project_1\\life
expectancy.csv')

filtered_data = Socio_Economic_and_Life_expectancy.dropna(subset=['IncomeGroup', 'Communicable',

'NonCommunicable', 'Injuries'])
avg_diseases_and_injuries_by_income = filtered_data.groupby('IncomeGroup').agg({
'Communicable': 'mean',
'NonCommunicable': 'mean',
'Injuries': 'mean'
}).reset_index()

avg_diseases_and_injuries = pd.melt(avg_diseases_and_injuries_by_income, id_vars='IncomeGroup',

value_vars=['Communicable', 'NonCommunicable', 'Injuries'],
var_name='DiseaseType', value_name='AverageCount')

income_levels = ["Low income", "Lower middle income", "Upper middle income", "High income"]
avg_diseases_and_injuries['IncomeGroup'] = pd.Categorical(avg_diseases_and_injuries['IncomeGroup'],
categories=income_levels, ordered=True)

sns.set_theme(style="whitegrid")
palette = {
'Communicable': 'red',
'NonCommunicable': 'blue',
'Injuries': 'yellow'
}
g = sns.catplot(
data=avg_diseases_and_injuries,
x='IncomeGroup', y='AverageCount', hue='DiseaseType',
kind='bar', col='DiseaseType', col_wrap=3, sharey=False,
palette=palette
)

for ax in g.axes.flat:
for p in ax.patches:
if p.get_height() > 0:
ax.annotate(f'{p.get_height():.2f}',
(p.get_x() + p.get_width() / 2., p.get_height()),
ha='center', va='center', xytext=(0, 10), textcoords='offset points')

g.set_axis_labels("", "Average DALYs")

g.set_titles("{col_name}")
g.fig.subplots_adjust(top=0.9)
g.fig.suptitle("Average DALYs due to various factors")

for ax in g.axes.flat:
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, _: f'{int(x):,}'))

for ax in g.axes.flat:
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha='right')

g._legend.remove()
plt.show()
Average DALYs due to various factors

This plot illustrates the average amount of DALYs caused by various

factors such as communicable and non-communicable diseases, as
well as DALYs caused by injuries for different income groups across
various disease categories.

As seen from the plot, the DALYs caused by injuries aren’t as prevalent
as DALYs caused by diseases. The highest amount of healthy years lost
by Injuries averages at around 2200000 in the Lower-Middle Income
group.

DALYs caused by communicable diseases amount to the highest

amount of healthy years lost, with countries in the lower income
groups being affected the most. Low and Lower-middle income groups
have an average of 7740000 and 11800000 DALYs due to
communicable diseases respectively. However, lower-middle income
countries also face the highest amount of average DALYs lost due to
non-communicable diseases, at around 11186362.52

On the other hand, Communicable diseases have little presence in

Higher income groups, as shown by the small number of average
DALYs in Upper-middle income and High income groups. Upper-middle
income has around 1,600,000 average DALYs and High income has
only on average, 282000 DALYS. Conversely, non-communicable
diseases still affect higher income groups, with around 9160000
average DALYs for Upper-middle income and close to 4730000
average DALYs for High income.

Overall, the plot shows that Lower Income countries are affected by
both communicable and non-communicable diseases, whereas Higher
income countries have most DALYs only due to non-communicable
diseases, suggesting that these countries have a better quality of life
than poorer ones. Moreover, Injuries does not amount to DALYs as
much as diseases in any income groups.
The percentage of Income Groups
of different Regions
The percentage of Income Groups
of different Regions
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

file_path = 'C:\\Users\\HP\\Desktop\\VGU\\Python\\Project_1\\life expectancy.csv'

Socio_Economic_and_Life_expectancy = pd.read_csv(file_path)

region_income_counts = pd.crosstab(Socio_Economic_and_Life_expectancy['Region'],
Socio_Economic_and_Life_expectancy['IncomeGroup'])

region_income_dataframe = region_income_counts.reset_index().melt(id_vars='Region',
var_name='IncomeGroup', value_name='Count')

region_income_dataframe = region_income_dataframe[region_income_dataframe['Count'] != 0]
region_income_dataframe['Percentage'] = region_income_dataframe.groupby('Region')
['Count'].transform(lambda x: x / x.sum() * 100)
income_group_colors = {
'High income': 'red',
'Upper middle income': 'blue',
'Lower middle income': 'green',
'Low income': 'yellow'
}
sns.set(style="whitegrid")
g = sns.FacetGrid(region_income_dataframe, col="Region", col_wrap=2, sharex=False, sharey=False)

def pie_plot(data, **kwargs):

data = data.sort_values('IncomeGroup')
colors = [income_group_colors[group] for group in data['IncomeGroup']]
wedges, texts, autotexts = plt.pie(data['Percentage'], labels=data['IncomeGroup'], autopct='%1.0f%%',
colors=colors,
textprops={'color': "black"}, startangle=90)
plt.setp(autotexts, size=10)
plt.gca().set_aspect('equal')

g.map_dataframe(pie_plot)
g.set_titles("{col_name}")
g.fig.suptitle("The percentage of Income Groups of different Regions", y=1.05)

handles, labels = g.axes.flat[0].get_legend_handles_labels()

unique_labels = list(region_income_dataframe['IncomeGroup'].unique())
unique_handles = [plt.Line2D([0], [0], marker='o', color='w', markerfacecolor=income_group_colors[label],
markersize=10) for label in unique_labels]
g.fig.legend(unique_handles, unique_labels, title="Income Group", loc="center left", bbox_to_anchor=(1, 0.5))

plt.text(0.5, 0.95, "Income groups throughout different regions", horizontalalignment='center', fontsize=14,

transform=g.fig.transFigure)

plt.subplots_adjust(top=0.90, right=0.85)
plt.show()
The percentage of Income Groups
of different Regions

The pie chart above illustrates the relationship between different

income groups across various regions. Each pie chart represents a
specific region, with slices representing different income groups.
The size of each slice corresponds to the proportion of countries
within that income group relative to the total number of countries in
the region.

From the plot, we can see that regions such as East Asia & Pacific,
Europe & Central Asia, Latin America & Caribbean and Middle East
& North Africa have no Low income countries, with North America
having 100% of its countries in the High income group.

On the other hand, the majority of Sub-Saharan countries are in the

lower income group. 50% of countries in Sub-Saharan Africa are in
the Low income group and 34% of its countries in the Lower-middle
income one, with only 2% of countries in the High income one and
South Asia having no countries in the High income category and
75% South Asian countries are in the Lower-middle income group.

Overall, the plot shows that the plot shows that, for the majority of
regions, there are no presence of Low income group. At the same
time, North America is only consisted of High income countries. Low
income countries are most prevalent in Sub-Saharan African
countries and Lower-middle income countries in South Asia.
OCCURANCES OF CORRUPTION
OF DIFFERENT INCOME GROUPS
BY YEARS
OCCURANCES OF CORRUPTION
OF DIFFERENT INCOME GROUPS
BY YEARS
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Socio_Economic_and_Life_expectancy =
pd.read_csv('C:\\Users\\HP\\Desktop\\VGU\\Python\\Project_1\\life expectancy.csv')
Socio_Economic_and_Life_expectancy['Corruption'] =
Socio_Economic_and_Life_expectancy['Corruption'].fillna('N/A')

income_order = ["Low income", "Lower middle income", "Upper middle income", "High income"]
Socio_Economic_and_Life_expectancy['IncomeGroup'] =
pd.Categorical(Socio_Economic_and_Life_expectancy['IncomeGroup'],
categories=income_order, ordered=True)

g = sns.catplot(
data=Socio_Economic_and_Life_expectancy,
x='Corruption',
hue='IncomeGroup',
kind='count',
palette='viridis',
col='IncomeGroup',
col_wrap=2,
height=4,
aspect=1,
legend=False
)

for ax in g.axes.flatten():
for c in ax.containers:
labels = [f'{int(v.get_height())}' for v in c]
ax.bar_label(c, labels=labels, label_type='edge', padding=2, fontsize=10)

g.set_axis_labels("Corruption Rating", "Countries by Year Occurrences")

g.set_titles("{col_name}")
g.fig.suptitle("Instances of corruption of different Income Groups by years", y=0.99)

for ax in g.axes.flatten():
ax.tick_params(axis='x', rotation=45)
ax.set_xticks(range(len(Socio_Economic_and_Life_expectancy['Corruption'].unique())))
ax.set_xticklabels(Socio_Economic_and_Life_expectancy['Corruption'].unique(), rotation=45)

plt.show()
OCCURANCES OF CORRUPTION
OF DIFFERENT INCOME GROUPS
BY YEARS

The plot depicts the count of countries falling under a particular

Corruption Rating category, grouped by Income Group. The x-axis
represents the Corruption Rating, while the y-axis represents the
number of countries by year.

From the plot, we can observe that there is considerable variation

in corruption levels across different income groups. The most
noticeable point to be seen is that all 1083 High-income countries
have no corruption rating and Upper-middle income group only
has a few countries having a corruption rating.

On the other hand, Lower-middle Income has a large amount of

countries has corruption ratings from 1 to 4.5 with most being 188
countries having a 3 corruption rating. Low income countries also
have 96 and 93 having 2.5 and 3 corruption ratings respectively.

Overall, the plot shows that for the countries that are in the
higher income groups, there are low to no presence of corruption.
Lower income groups, however, are more prone to corruption of
various levels.
HEALTH AND EDUCATION EXPENDITURE OF
COUNTRIES IN DIFFERENT REGIONS
HEALTH AND EDUCATION EXPENDITURE OF
COUNTRIES IN DIFFERENT REGIONS

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
file_path = 'C:\\Users\\HP\\Desktop\\VGU\\Python\\Project_1\\life
expectancy.csv'
Socio_Economic_and_Life_expectancy = pd.read_csv(file_path)
filtered_data = Socio_Economic_and_Life_expectancy[['Health
Expenditure %', 'Education Expenditure %', 'Region']].dropna()
regions = filtered_data['Region'].unique()
palette = dict(zip(regions, sns.color_palette("tab10", len(regions))))
g = sns.FacetGrid(filtered_data, col="Region", col_wrap=3,
height=4, sharex=False, sharey=False)
g.map_dataframe(sns.scatterplot, x="Health Expenditure %",
y="Education Expenditure %", hue="Region", palette=palette,
legend=False)
g.set_titles(col_template="{col_name}")
plt.subplots_adjust(top=0.9)
g.fig.suptitle("Health and Education Expenditure of Countries in
Different Regions", fontsize=16)

plt.show()
HEALTH EXPENDITURE EXPENDITURE OF
COUNTRIES IN DIFFERENT REGIONS
This scatter plot visualizes the relationship between Health
Expenditure % and Education Expenditure % across different
regions. Each point corresponds to a specific country within a
region and indicates the values of health expenditure and
education expenditure as a percentage of GDP for that country.

From the plot above, we can see that most countries from
different regions spend the majority of 2 to 10% on Health and
Education Expenditure. These regions include Sub-Saharan Africa,
East Asia & Pacific, Europe & Central Asia and Latin America &
Caribbean. Regions such as South Asia and Middle East & North
Africa however have a more varying expenditure on Health and
Education expenditure.

North America spends more on Health Expenditure than other

regions, with all countries spending from 9% to around 17%. On the
other hand, East Asia & Pacific have more Education Expenditure,
with countries spending from 15% up to 20% of their GDP.

Overall, the plot shows that countries would prioritize spending on

Health rather than on Education. Moreover, most regions spend up
to around 10% of their GDP on Education, but some countries in
East Asia & Pacific spend from 10% to 20% of their GDP on
Education.
Prevalence of Undernourishment
across various income groups
Prevalence of Undernourishment
across various income groups
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Socio_Economic_and_Life_expectancy =
pd.read_csv('C:\\Users\\HP\\Desktop\\VGU\\Python\\Project_1\\life expectancy.csv')

filtered_data = Socio_Economic_and_Life_expectancy.dropna(subset=['Prevelance of
Undernourishment'])

income_group_order = filtered_data['IncomeGroup'].unique()

custom_palette = {"Low income": "red", "Lower middle income": "blue", "Upper middle
income": "yellow", "High income": "green"}

g = sns.FacetGrid(filtered_data, col="IncomeGroup", hue="IncomeGroup", aspect=1.5,

height=3, palette=custom_palette, col_order=income_group_order, legend_out=True)

g.map(sns.kdeplot, "Prevelance of Undernourishment", fill=True, alpha=0.6, bw_adjust=0.5)

for ax in g.axes.flat:
ax.set_xlabel('Prevalence of Undernourishment (%)')
ax.set_xlim(0, 60)
ax.set_title('')

g.set_axis_labels("", "")
g.fig.suptitle("Prevalence of Undernourishment across various income groups", y=0.99)

g.add_legend(title='Income Group')

plt.text(30, -10, 'Ridgeline Plot', fontsize=14, ha='center')

plt.show()
Prevalence of Undernourishment
across various income groups

The ridgeline plot visualizes the distribution of prevalence of

undernourishment across different income groups.

Examining the plot reveals that higher income groups have

less prevalence of undernourishment than lower income
groups. High income group has almost only about 2% of
malnourishment and Upper-middle income have about 3%
with very small ridges spanning from 16% to 25%.

On the other hand, Lower-middle income group have a

moderate ridgeline line spanning to 26%. Low income group
are more severe for most cases with ridges spanning from
10% to 40% and there are even tiny ridges at 70%.

In conclusion, the general trend shown in this plot is that the

higher income groups would have less prevalence of
malnourishment.
Average Education Expenditure vs
Average Unemployment Rate by Region
Average Education Expenditure vs
Average Unemployment Rate by Region
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Socio_Economic_and_Life_expectancy =
pd.read_csv(r"C:\Users\Admin\OneDrive\Documents\Python\life expectancy.csv")
filtered_data = Socio_Economic_and_Life_expectancy[['Year', 'Unemployment',
'Education Expenditure %', 'Region']].dropna()

avg_data = filtered_data.groupby(['Year', 'Region']).agg(

avg_unemployment=('Unemployment', 'mean'),
avg_education_expenditure=('Education Expenditure %', 'mean')
).reset_index()

palette = sns.color_palette('tab10', avg_data['Region'].nunique())

region_colors = dict(zip(avg_data['Region'].unique(), palette))

def scatter_with_color(data, **kwargs):

region = data['Region'].iloc[0]
color = region_colors[region]
sns.scatterplot(data=data, x='avg_education_expenditure',
y='avg_unemployment', color=color)
sns.regplot(data=data, x='avg_education_expenditure', y='avg_unemployment',
scatter=False, ci=None, color=color, line_kws={"lw": 1.2})

g = sns.FacetGrid(avg_data, col="Region", col_wrap=4, sharex=False, sharey=False)

g.map_dataframe(scatter_with_color)

g.set_axis_labels("Average Education Expenditure (%)", "Average Unemployment

Rate (%)")
g.set_titles("{col_name}")

plt.subplots_adjust(top=0.92)
g.fig.suptitle('Average Education Expenditure vs Average Unemployment Rate by
Region', fontsize=16)
plt.show()
Average Education Expenditure vs
Average Unemployment Rate by Region
The plot illustrates how the average unemployment rate relates to the
average education expenditure percentage across different regions over
several years. Each point represents a specific year within a region.

As we observe the plot, we can see that there are different positive or
negative relationships between average education expenditure and average
unemployment for different regions. With East Asia & Pacific displaying
almost no correlation between unemployment and education.

For Sub-Saharan African countries, the general trend is that the year that
countries spent more on education is also the year with the lesser amount
of unemployment. Some years have an average of Education Expenditure
ranging from 3.25% to 4.25% have unemployment from 9 to nearly 11%. But
for the most part, for the years that spent the same amount of education
expenditure have only around 6 to 8% of unemployment. The same trend
also happens for East South Asia and Latin America & Caribbean.

Countries in Europe & Central Asia, Middle East & North Africa and North
America, however, have a positive correlation between average education
and average unemployment. With the most notable one, being Middle East
& North Africa. For the years that have a recording of average education
expenditure ranging from 4.25 to 4.75%. They have an unemployment rate
hovering around 6 to 7%. However, the years that spend more than 4.75%
also seem to have a higher unemployment rate, with an instance of 6% in
Education expenditure but also 11% in the unemployment rate.

Countries in East Asia & Pacific, however, have almost no correlation

between unemployment and education expenditure.

Overall, the plot shows that, unemployment and education expenditure

have only a little impact of the improvement of unemployment rates in
most of the regions and that there are other factors that play a part in
affecting unemployment.
AVERAGE CO2 EMISSIONS OF DIFFERENT
REGIONS FROM 2001 TO 2019
AVERAGE CO2 EMISSIONS OF DIFFERENT
REGIONS FROM 2001 TO 2019
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

avg_co2 = filtered_data.groupby(['Year', 'Region']).agg(

avg_CO2=('CO2', 'mean')
).reset_index()

palette = sns.color_palette('tab10', avg_co2['Region'].nunique())

region_colors = dict(zip(avg_co2['Region'].unique(), palette))

def lineplot_with_color(data, **kwargs):

region = data['Region'].iloc[0]
color = region_colors[region]
sns.lineplot(data=data, x='Year', y='avg_CO2', color=color)

g = sns.FacetGrid(avg_co2, col="Region", col_wrap=4, sharey=False, height=4)

g.map_dataframe(lineplot_with_color)

g.set_axis_labels("Year", "Average CO2 (Kilotons)")

g.set_titles("{col_name}")
g.fig.suptitle("Average CO2 Emissions of Different Regions from 2001 to 2019", fontsize=16)

for ax in g.axes.flat:
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, _: f'{int(x):,}'))

plt.subplots_adjust(left=0.1, top=0.92)
plt.show()
AVERAGE CO2 EMISSIONS OF DIFFERENT
REGIONS FROM 2001 TO 2019
The plot visualizes the average levels of CO2 emissions over time
across different regions. Each line represents the average CO2
levels for a specific region.

Looking at the graph, we can see that almost all regions display a
steep increase in CO2 emissions every year, with only North
America and Europe & Central Asia showing a downward trend.

East Asia & Pacific have the most amount of CO2 emissions of all
regions. It was reaching up to 650,000 Kilotons in 2019. On the
other hand, Sub-Saharan Africa has the lowest amount of CO2
emissions, even on the upward trend, it reaches only about 18,500
kilotons.

North America and Europe & Central Asia are different from other
regions because their CO2 emission is decreasing. North America
had a CO2 level reported in 2015 at around 3,100,000 kilotons but
at the end of 2019, the number is now at 2,700,000 kilotons. The
same goes for Europe & Central Asia with their reported CO2
emission at 105,000 and ending in 2019 at around 87,000 kilotons.

In conclusion, the graph shows that due to economic reasons,

almost all countries have an increase in CO2 emissions from 2001
to 2019. Only North America and Europe & Central Asia is showing
a downward trend.
Distribution of average Unemployment of
different years in different regions
Distribution of average Unemployment of
different years in different regions

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Socio_Economic_and_Life_expectancy =
library(ggplot2)
pd.read_csv('C:\\Users\\huydo\\OneDrive\\Desktop\\Python\\life expectancy.csv')
library(dplyr)
filtered_data = Socio_Economic_and_Life_expectancy[['Year', 'Unemployment',
filtered_data <-
'Region']].dropna()
avg_unemployment = filtered_data.groupby(['Year', 'Region'],
na.omit(Socio_Economic_and_Life_expectancy[c("Sanitation",
as_index=False).agg({'Unemployment': 'mean'}).rename(columns={'Unemployment':
"Region", "IncomeGroup")])
'avg_unemployment'})

ggplot(filtered_data,
plt.figure(figsize=(15, 10)) aes(x = Region, y = Sanitation, fill =
gIncomeGroup))
= sns.FacetGrid(avg_unemployment,
+ col="Region", col_wrap=4, sharey=False)
g.map_dataframe(sns.histplot, x='avg_unemployment', binwidth=0.5, kde=False,
geom_boxplot() +
alpha=0.7)
labs(y = "Sanitation %", fill = "Income Group") +
g.set_titles("{col_name}")
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1), axis.title.x =
plt.subplots_adjust(top=0.9)
g.fig.suptitle("Distribution of average unemployment of different years in different
element_blank())+
regions", fontsize=16)
scale_fill_viridis_d(option = "C")
num_regions = avg_unemployment['Region'].nunique()
colors = sns.color_palette("viridis", num_regions)

for ax, color in zip(g.axes.flatten(), colors):

for patch in ax.patches:
patch.set_facecolor(color)

for ax in g.axes.flatten():
ax.set_xlabel('')
ax.set_ylabel('')
g.fig.text(0.5, 0.02, 'Average Unemployment Rate', ha='center', fontsize=12)
g.fig.text(0.02, 0.5, 'Number of Years', va='center', rotation='vertical', fontsize=12)
plt.show()
Distribution of average Unemployment of
different years in different regions

The histogram visualizes the distribution of average

unemployment rates across different regions over a specified
time frame. Each facet represents a distinct region, allowing for
a comparative analysis of unemployment trends. The x-axis
depicts the average unemployment rate, while the y-axis
indicates the frequency of occurrence within each bin.

Upon examination, we can see that the East Asia & Pacific
region has the lowest average rate of unemployment, with the
majority of the countries having an average of 2.6 to nearly 4%
of their labor forces. Alongside the East Asia & Pacific region,
South Asia also exhibit a low average unemployment rate with a
concentration of around 5%.

On the other hand, Sub-Saharan Africa, while not having the

highest unemployment rate, the region have most of its
countries having an average from 7.6% to 8.75%. Whereas
Europe & Central Asia have an average unemployment rate
ranging from around 6.1% to the highest average
unemployment rate of any region, nearly 11.5%.

Overall, the plot shows that Sub-Saharan countries have a

critical average rate of unemployment and Europe and Central
Asia have the highest instance of average rate of
unemployment.
Average Sanitation across different
regions and income groups
Average Sanitation across different
regions and income groups
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Socio_Economic_and_Life_expectancy =
pd.read_csv('C:\\Users\\huydo\\OneDrive\\Desktop\\Python\\life
expectancy.csv')

filtered_data = Socio_Economic_and_Life_expectancy[['Sanitation', 'Region',

'IncomeGroup']].dropna()

income_groups = filtered_data['IncomeGroup'].unique()
colors = ['red', 'blue', 'green', 'yellow']
if len(income_groups) > len(colors):
raise ValueError("Not enough colors defined for the number of unique
IncomeGroups")
palette_dict = dict(zip(income_groups, colors[:len(income_groups)]))
plt.figure(figsize=(12, 8))
sns.boxplot(data=filtered_data, x='Region', y='Sanitation', hue='IncomeGroup',
palette=palette_dict)
plt.xlabel('')
plt.ylabel('Sanitation %')
plt.title('Average Sanitation across different regions and Income Groups')
plt.xticks(rotation=45, ha='right')

plt.legend(title='Income Group')
sns.set_theme(style="whitegrid")

plt.tight_layout()
plt.show()
Average Sanitation across different
regions and income groups
The provided plot visualizes the relationship between average
sanitation levels, regions, and income groups. The x-axis denotes
various regions, while the y-axis represents the average percentage
of the population of the corresponding regions that has access to
safe sanitation services.

Beginning with the broad overview, it is clear that the High income
countries are the ones that remain the most sanitized.

Upon closer examination, almost all High income countries in

various regions have above 75% of the population using safe
sanitation services. North American countries have the highest
range of average percentage of the population having good
sanitation, from 87.5% to nearly 100%. However, unlike other regions,
all High Income countries in Latin America & Caribbean have an
average sanitation value ranging from around 31% to only 62.5%.

On the other hand, Lower income groups have the average

population with sanitation ranging from 18.75% to 62.5%. Notably,
Sub-Saharan Africa has an even lower average range, from around
13% to slightly above 25%. Low income countries in Africa have the
lowest range of average sanitation, from 7% to around 17%.

In conclusion, the plot shows that the higher income group will have
more of its population having access to clean sanitation services. It
also highlights the severe lack of sanitation in countries throughout
Sub-Saharan Africa.

SPM Notes
No ratings yet
SPM Notes
215 pages
Inventory of Digital and Physical Books 09-21-2016
No ratings yet
Inventory of Digital and Physical Books 09-21-2016
363 pages
Indicators of Health - Mrs. NP
No ratings yet
Indicators of Health - Mrs. NP
38 pages
SPM Smahrt Share
No ratings yet
SPM Smahrt Share
215 pages
Wa0002
No ratings yet
Wa0002
309 pages
Becker Et Al. (2005)
No ratings yet
Becker Et Al. (2005)
15 pages
Added Topics in Geography
No ratings yet
Added Topics in Geography
41 pages
SOP-Rooftop Breakfast (Final)
No ratings yet
SOP-Rooftop Breakfast (Final)
45 pages
Concept of Health and Disease PDF
No ratings yet
Concept of Health and Disease PDF
106 pages
Aqa 71821 QP Mqp18a4 Jun23
No ratings yet
Aqa 71821 QP Mqp18a4 Jun23
66 pages
Concepts of Health and Disease
No ratings yet
Concepts of Health and Disease
40 pages
Todaro 13 CHP 02
No ratings yet
Todaro 13 CHP 02
54 pages
Concept of Health and Diseases
No ratings yet
Concept of Health and Diseases
20 pages
CB Insights - Digital Health Q3 2023
No ratings yet
CB Insights - Digital Health Q3 2023
78 pages
Health Indicators
No ratings yet
Health Indicators
45 pages
s10668 021 01225 2
No ratings yet
s10668 021 01225 2
18 pages
Class Indicators of Health 24
No ratings yet
Class Indicators of Health 24
34 pages
VS 3.1 Assignment 2
No ratings yet
VS 3.1 Assignment 2
19 pages
World Bank Research Project - Hailley Garellek
No ratings yet
World Bank Research Project - Hailley Garellek
42 pages
Concepts of Health-6
No ratings yet
Concepts of Health-6
34 pages
Indicators of Health
No ratings yet
Indicators of Health
31 pages
Development Economics
No ratings yet
Development Economics
10 pages
1pe0 01 Que 20220525 - 1
No ratings yet
1pe0 01 Que 20220525 - 1
32 pages
Shahbaz 2019 (Moderator Resistance To Change)
No ratings yet
Shahbaz 2019 (Moderator Resistance To Change)
20 pages
2.1.psychomotor Skill Development Training For Clinical Preceptorship
No ratings yet
2.1.psychomotor Skill Development Training For Clinical Preceptorship
30 pages
Mathematics (SDG 3)
No ratings yet
Mathematics (SDG 3)
13 pages
Option F - Food and Health - Notes by jv#0180
No ratings yet
Option F - Food and Health - Notes by jv#0180
19 pages
99villa - Riverhill
No ratings yet
99villa - Riverhill
24 pages
Health Disparities in Russia at The Regional and Global Scales
No ratings yet
Health Disparities in Russia at The Regional and Global Scales
16 pages
The Link Between Cannabis and Psychosis in Teens Is Real Scientific American
No ratings yet
The Link Between Cannabis and Psychosis in Teens Is Real Scientific American
10 pages
Healthcare 12 01148 v2
No ratings yet
Healthcare 12 01148 v2
15 pages
Idicators of Health
No ratings yet
Idicators of Health
23 pages
Country Analysis 1707545820
No ratings yet
Country Analysis 1707545820
16 pages
Linear Regression - Notebook - Reference - PDF
No ratings yet
Linear Regression - Notebook - Reference - PDF
56 pages
A Study On Socioeconomics in China
No ratings yet
A Study On Socioeconomics in China
15 pages
EN WHS 2019 Annex2
No ratings yet
EN WHS 2019 Annex2
112 pages
MT 104 Lec
No ratings yet
MT 104 Lec
6 pages
World 05 00030
No ratings yet
World 05 00030
15 pages
Pathfit Long Quiz Reviewer
No ratings yet
Pathfit Long Quiz Reviewer
5 pages
Diabetology & Metabolic Syndrome
No ratings yet
Diabetology & Metabolic Syndrome
7 pages
Indicators of Health
100% (1)
Indicators of Health
31 pages
Roffia
No ratings yet
Roffia
24 pages
Econ 2169
No ratings yet
Econ 2169
21 pages
CHN Ii Finals
No ratings yet
CHN Ii Finals
29 pages
Nana - Andre Wendindonde - TFM
No ratings yet
Nana - Andre Wendindonde - TFM
21 pages
Story For SWD
No ratings yet
Story For SWD
7 pages
Manual Adept
No ratings yet
Manual Adept
196 pages
Journal of Clinical Nursing - 2016 - Ceylan - Evaluation of Oxygen Saturation Values in Different Body Positions in Healthy
No ratings yet
Journal of Clinical Nursing - 2016 - Ceylan - Evaluation of Oxygen Saturation Values in Different Body Positions in Healthy
6 pages
Children in A Culturally Diverse Society-Reporting
No ratings yet
Children in A Culturally Diverse Society-Reporting
7 pages
Cattle Housing Systems
No ratings yet
Cattle Housing Systems
19 pages
Best Ia Sample
67% (3)
Best Ia Sample
35 pages
ComPub 2
No ratings yet
ComPub 2
5 pages
Economics Chapter 1 Development
No ratings yet
Economics Chapter 1 Development
8 pages
Early Hemodynamic Management of Critically Ill Burn Patients
No ratings yet
Early Hemodynamic Management of Critically Ill Burn Patients
7 pages
Block 3 - Economics of Health Notes
No ratings yet
Block 3 - Economics of Health Notes
18 pages
E Cigarette Dissertation
100% (2)
E Cigarette Dissertation
6 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
5 pages
Health Indicator
No ratings yet
Health Indicator
30 pages
Community Profile: Name: Date: Address
No ratings yet
Community Profile: Name: Date: Address
16 pages
Health & Safety Officer JD PS
No ratings yet
Health & Safety Officer JD PS
3 pages
Top Three (3) Entrepreneurial Habits I Practiced This Week
No ratings yet
Top Three (3) Entrepreneurial Habits I Practiced This Week
3 pages
Louise Kevin C. Belen Bsed-1K Activity 1. Presidential Decree No. 603 - Children and Youth Welfare Code
No ratings yet
Louise Kevin C. Belen Bsed-1K Activity 1. Presidential Decree No. 603 - Children and Youth Welfare Code
16 pages
IG 5.1 Living Standards 2
No ratings yet
IG 5.1 Living Standards 2
1 page
Food and Health Revision Bible
No ratings yet
Food and Health Revision Bible
53 pages
Economic Development
No ratings yet
Economic Development
2 pages
Molecular Basis of Inflammation As Immune Response
No ratings yet
Molecular Basis of Inflammation As Immune Response
7 pages
Countries Health Wealth 2016 v84 PDF
No ratings yet
Countries Health Wealth 2016 v84 PDF
1 page
DLL - MAPEH 6 - Q1 - W1 Ok
No ratings yet
DLL - MAPEH 6 - Q1 - W1 Ok
6 pages
2 Health-Assessment-Chapter-2-Holistic-Approach
No ratings yet
2 Health-Assessment-Chapter-2-Holistic-Approach
58 pages
Summary THW Ban The Use of Plastic Bags For Foods and Beverages
No ratings yet
Summary THW Ban The Use of Plastic Bags For Foods and Beverages
2 pages
2nd Term XI Question Paper
No ratings yet
2nd Term XI Question Paper
21 pages
Quality of Life As The Basis For Achieving Social Welfare of The Population
No ratings yet
Quality of Life As The Basis For Achieving Social Welfare of The Population
6 pages
The Relationship Between Life Expectancy at Birth and Health Expenditures Estimated by A Cross-Country and Time-Series Analysis
No ratings yet
The Relationship Between Life Expectancy at Birth and Health Expenditures Estimated by A Cross-Country and Time-Series Analysis
7 pages
Gross Domestic Product (GDP) Is The Primary Indicator or Measure of Economic Production Within A
No ratings yet
Gross Domestic Product (GDP) Is The Primary Indicator or Measure of Economic Production Within A
3 pages
Quality of Life: International Development Standard of Living
No ratings yet
Quality of Life: International Development Standard of Living
5 pages
Health Equity in Myanmar
100% (1)
Health Equity in Myanmar
5 pages
Gross Domestic Product
No ratings yet
Gross Domestic Product
5 pages
Essential Notes - Health, Human Rights and Intervention - Edexcel Geography A-Level
No ratings yet
Essential Notes - Health, Human Rights and Intervention - Edexcel Geography A-Level
11 pages
Measurement of Coping Styles
No ratings yet
Measurement of Coping Styles
7 pages
Study of Recruitment Process and Employee Engagement at Aditya Birla Health Insurance
100% (1)
Study of Recruitment Process and Employee Engagement at Aditya Birla Health Insurance
3 pages
BOSH GROUP 2 Rule 1020 1030
No ratings yet
BOSH GROUP 2 Rule 1020 1030
11 pages
Iii. Quality of Life: © OECD 2011 Compendium of OECD Well-Being Indicators
No ratings yet
Iii. Quality of Life: © OECD 2011 Compendium of OECD Well-Being Indicators
17 pages
Social Determinants of Health
100% (7)
Social Determinants of Health
27 pages
Indonesia 2017 CP SDGProfile WHO
No ratings yet
Indonesia 2017 CP SDGProfile WHO
4 pages
Economic and Political Weekly
No ratings yet
Economic and Political Weekly
7 pages
Susie Sample: Majors PTI Personality Report
No ratings yet
Susie Sample: Majors PTI Personality Report
9 pages
Hdi Report 1
No ratings yet
Hdi Report 1
9 pages
A Psychological Perspective Of The Health Personnel In Times Of Pandemic
From Everand
A Psychological Perspective Of The Health Personnel In Times Of Pandemic
Juan Moises de la Serna
No ratings yet
Universal Health Coverage in China: A Health Economic Perspective
From Everand
Universal Health Coverage in China: A Health Economic Perspective
David S. Weis
No ratings yet
Pandemic Preparedness and Response Strategies: COVID-19 Lessons from the Republic of Korea, Thailand, and Viet Nam
From Everand
Pandemic Preparedness and Response Strategies: COVID-19 Lessons from the Republic of Korea, Thailand, and Viet Nam
Asian Development Bank
No ratings yet

Python Project 1

Uploaded by

Python Project 1

Uploaded by

Python Project 1

Nguyễ n Phương Vũ Nguyên

Nguyễ n Vương Minh 1.0

Hoàng Xuân Phước 1.0

12 BFA & BBA

Instructor : Dr. Do Duc Tan

We aim to explore the correlation between income levels and quality

Description of Variables related to this dataset

10. Education Expenditure (% of GDP):

11. Unemployment (% total labor force):

12.Corruption (CPIA rating): SWOT Analysis in 3D

14.Disability-Adjusted Life Years (DALYs): due to Injuries -

15.Disability-Adjusted Life Years (DALYs): due to

16.Disability-Adjusted Life Years (DALYs): due to Non-

02 Average DALYs due to various factors

03 The percentage of Income Groups of

05 Health Expenditure expenditure of

07 Average Education Expenditure and

08 Average Co2 Emissions of different

09 Distribution of average Unemployment

10 Average Sanitation across different

Plot1 ['IncomeGroup'] = pd.Categorical(Plot1['IncomeGroup'],

plt.title('Life Expectancy by Income Groups', color = 'blue')

The violin plot illustrates the distribution of life expectancy

From the plot, we can observe a clear trend of increasing

Overall, the plot highlights that countries in higher income

filtered_data = Socio_Economic_and_Life_expectancy.dropna(subset=['IncomeGroup', 'Communicable',

avg_diseases_and_injuries = pd.melt(avg_diseases_and_injuries_by_income, id_vars='IncomeGroup',

g.set_axis_labels("", "Average DALYs")

This plot illustrates the average amount of DALYs caused by various

DALYs caused by communicable diseases amount to the highest

On the other hand, Communicable diseases have little presence in

file_path = 'C:\\Users\\HP\\Desktop\\VGU\\Python\\Project_1\\life expectancy.csv'

def pie_plot(data, **kwargs):

handles, labels = g.axes.flat[0].get_legend_handles_labels()

plt.text(0.5, 0.95, "Income groups throughout different regions", horizontalalignment='center', fontsize=14,

The pie chart above illustrates the relationship between different

On the other hand, the majority of Sub-Saharan countries are in the

g.set_axis_labels("Corruption Rating", "Countries by Year Occurrences")

The plot depicts the count of countries falling under a particular

From the plot, we can observe that there is considerable variation

On the other hand, Lower-middle Income has a large amount of

North America spends more on Health Expenditure than other

Overall, the plot shows that countries would prioritize spending on

g = sns.FacetGrid(filtered_data, col="IncomeGroup", hue="IncomeGroup", aspect=1.5,

g.map(sns.kdeplot, "Prevelance of Undernourishment", fill=True, alpha=0.6, bw_adjust=0.5)

plt.text(30, -10, 'Ridgeline Plot', fontsize=14, ha='center')

The ridgeline plot visualizes the distribution of prevalence of

Examining the plot reveals that higher income groups have

On the other hand, Lower-middle income group have a

In conclusion, the general trend shown in this plot is that the

avg_data = filtered_data.groupby(['Year', 'Region']).agg(

palette = sns.color_palette('tab10', avg_data['Region'].nunique())

def scatter_with_color(data, **kwargs):

g = sns.FacetGrid(avg_data, col="Region", col_wrap=4, sharex=False, sharey=False)

g.set_axis_labels("Average Education Expenditure (%)", "Average Unemployment

Countries in East Asia & Pacific, however, have almost no correlation

Overall, the plot shows that, unemployment and education expenditure

avg_co2 = filtered_data.groupby(['Year', 'Region']).agg(

palette = sns.color_palette('tab10', avg_co2['Region'].nunique())

def lineplot_with_color(data, **kwargs):

g = sns.FacetGrid(avg_co2, col="Region", col_wrap=4, sharey=False, height=4)

g.set_axis_labels("Year", "Average CO2 (Kilotons)")

In conclusion, the graph shows that due to economic reasons,

for ax, color in zip(g.axes.flatten(), colors):

The histogram visualizes the distribution of average

On the other hand, Sub-Saharan Africa, while not having the

Overall, the plot shows that Sub-Saharan countries have a

filtered_data = Socio_Economic_and_Life_expectancy[['Sanitation', 'Region',

Upon closer examination, almost all High income countries in

On the other hand, Lower income groups have the average

You might also like