0% found this document useful (0 votes)
18 views40 pages

Python Project 1

Uploaded by

phucn5203
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views40 pages

Python Project 1

Uploaded by

phucn5203
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Python Project 1

Life Expectancy

Nguyễ n Phương Vũ Nguyên


1.0

1.0
Trầ n Nguyễ n Quỳnh Anh

1.0
Nguyễ n Trầ n Hoàng Phúc

Nguyễ n Vương Minh 1.0

Đỗ Lê Huy
1.0

Hoàng Xuân Phước 1.0

12 BFA & BBA


02/06/2024
Group Leader
Nguyễ n Phương Vũ Nguyên Plot 3 - 4
Check content

Plot 5
Trầ n Nguyễ n Quỳnh Anh Design Report
Write description

Plot 1
Design Report
Nguyễ n Trầ n Hoàng Phúc
Check content

Plot 6
Write description
Nguyễ n Vương Minh
Check grammar
Re-check plot 1-3
Plot 7
Đỗ Lê Huy Write description
Re-check plot 9-10

Plot 8 - 10
Hoàng Xuân Phước Check grammar + content
Re-check plot 6 - 8
Python Project 1
Vietnamese - German University

LIFE EXPECTANCY
&
SOCIO-ECONOMIC
WORLD BANK

Instructor : Dr. Do Duc Tan


Thursday Morning Class
Group 12
LIFE EXPECTANCY
& SOCIO-ECONOMIC

DATASET
BASIC
INFORMATION
- SHRITEJ SHRIKANT CHAVAN -

Introduction
With 16 columns and 3307 rows of data, it provides a multifaceted
view of factors influencing human health and well-being. The "Life
Expectancy & Socio-Economic" dataset provides information on
various socio-economic factors and their impact on life expectancy
across different countries and regions.

Link
Life expectancy & Socio-Economic (world bank) dataset link:
https://www.kaggle.com/datasets/mjshri23/life-expectancy-and-
socio-economic-world-bank/data
The reason we choose this dataset
The "Life Expectancy & Socio-Economic" dataset offers a rich and
comprehensive exploration of the interplay between various socio-
economic indicators and life expectancy across different countries
and regions. This dataset is particularly intriguing due to its breadth,
covering aspects such as income groups, health expenditure,
education expenditure, unemployment rates, and prevalence of
undernourishment, among others.

We aim to explore the correlation between income levels and quality


of life from this dataset. Additionally, we seek to establish
connections between income levels and the prevalence of diseases,
shedding light on how socio-economic factors influence health
outcomes. Our approach involves comparing regions and countries
with diverse socio-economic profiles to understand how variations in
income impact quality of life and health outcomes, as well as to
assess the effectiveness of health and education expenditure in
reducing diseases and unemployment.

Description of Variables related to this dataset


1.Country: 174 countries
2. Country Code: 3-letter code
3. Region: region of the world SWOT Analysis in 3D
4. Income Group: country’s income class
5. Year: 2000-2019 (both included)
6. Life expectancy: data
7. Prevalence of Undernourishment (% of the population):
Prevalence of undernourishment is the percentage of the
population whose habitual food consumption is insufficient
to provide the dietary energy levels that are required to
maintain a normally active and healthy life
8. Carbon dioxide emissions (kiloton): Carbon dioxide
emissions are those stemming from the burning of fossil
fuels and the manufacture of cement. They include carbon
dioxide produced during the consumption of solid, liquid,
and gas fuels and gas flaring
9. Health Expenditure (% of GDP):
Level of current health expenditure expressed as a
percentage of GDP. Estimates of current health expenditures
include healthcare goods and services consumed during
each year. This indicator does not include capital health
expenditures such as buildings, machinery, IT, and stocks of
vaccines for emergencies or outbreaks.

10. Education Expenditure (% of GDP):


General government expenditure on education (current,
capital, and transfers) is expressed as a percentage of GDP. It
includes expenditures funded by transfers from international
sources to the government. General government usually
refers to local, regional, and central governments.

11. Unemployment (% total labor force):


Unemployment refers to the % share of the labor force that is
without work but available for and seeking employment

12.Corruption (CPIA rating): SWOT Analysis in 3D


Transparency, accountability, and corruption in the public
sector assets the extent to which the executive can be held
accountable for its use of funds and for the results of its
actions by the electorate and by the legislature and judiciary,
and the extent to which public employees within the
executive are required to account for administrative
decisions, use of resources, and results obtained.
13.Sanitation - People using safely managed sanitation
services (% of the population): The percentage of people
using improved sanitation facilities that are not shared with
other households and where excrete are safely disposed of
in site or transported and treated offsite. Improved
sanitation facilities include flush/pour flush to piped sewer
systems, septic tanks, or pit latrines: ventilated improved pit
latrines, compositing toilets, or pit latrines with slabs.
WHO/UNICEF Joint Monitoring Programme (JMP) for Water
Supply, Sanitation and Hygiene (washdata.org).

14.Disability-Adjusted Life Years (DALYs): due to Injuries -


One DALY represents. The loss of the equivalent of one year
of full health. DALYs for an injury or health condition is the
sum of the years of life lost due to premature mortality
(YLLs) and the years lived with a disability (YLDs) due to
prevalent cases of the disease in a population

15.Disability-Adjusted Life Years (DALYs): due to


Communicable diseases - One DALY represents the loss of
the equivalent of one year of full health. DALYs for a
communicable disease or health condition is the sum of the
SWOT
years of life lost due to premature Analysis
mortality in 3D
(YLLs) and the
years lived with a disability (YLDs) due to prevalent cases of
the disease in a population.

16.Disability-Adjusted Life Years (DALYs): due to Non-


Communicable diseases - One DALY represents the loss of
the equivalent of one year of full health. DALYs for a non-
communicable disease or health condition is the sum of the
years of life lost due to premature mortality (YLLs) and the
years lived with a disability (YLDs) due to prevalent cases of
the disease in a population.
Table of
CONTENTS
01 Life Expectancy by Income Group

02 Average DALYs due to various factors

03 The percentage of Income Groups of


different Regions

04 Occurances of corruption of
different income groups by years

05 Health Expenditure expenditure of


countries in different regions
Table of
CONTENTS
06 Prevalence of Undernourishment
across various income groups

07 Average Education Expenditure and


Unemployment by Region

08 Average Co2 Emissions of different


regions from 2001 to 2019

09 Distribution of average Unemployment


of different years in different regions

10 Average Sanitation across different


regions and income groups
LIFE EXPECTANCY BY
INCOME GROUP
LIFE EXPECTANCY BY
INCOME GROUP

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Socio_Economic_and_Life_expectancy =
pd.read_csv('C:\\Users\\Phucn\\Documents\\Python\\1.csv')

Plot1 = Socio_Economic_and_Life_expectancy.dropna(subset=["IncomeGroup",
"Life Expectancy World Bank"])

Plot1 ['IncomeGroup'] = pd.Categorical(Plot1['IncomeGroup'],


categories=["Low income", "Lower middle income", "Upper
middle income", "High income"],
ordered=True)

plt.figure(figsize=(10, 6))
sns.violinplot(data=Plot1, x='IncomeGroup', y='Life Expectancy World Bank',
scale='width', inner='quartile', palette='PuBu')

plt.title('Life Expectancy by Income Groups', color = 'blue')


plt.ylabel('Life Expectancy')
plt.xlabel('')

plt.legend([],[], frameon=False)

sns.set_style("whitegrid")
plt.show()
LIFE EXPECTANCY BY
INCOME GROUP

The violin plot illustrates the distribution of life expectancy


across different income groups. Each violin represents a
specific income group, with the width indicating the density
of countries within that group. The shape of the violins
depicts the distribution of life expectancy values within
each income category.

From the plot, we can observe a clear trend of increasing


life expectancy as we move from lower-income to higher-
income groups. Countries in the high-income group exhibit
the highest life expectancy, with a concentration around 82
years of age. Conversely, countries in the low-income group
show a wider range of life expectancy, with values spanning
from 50 to 65 years. The lower-middle income group tends
to have a median life expectancy of around 70 years, while
the upper-middle income group exhibits a median of
around 73 years.

Overall, the plot highlights that countries in higher income


groups generally have better life expectancy than those
with lower income, as evidenced by the broader and taller
violins in higher-income categories.
Average DALYs due to various factors
Average DALYs due to various factors
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

Socio_Economic_and_Life_expectancy = pd.read_csv('C:\\Users\\HP\\Desktop\\VGU\\Python\\Project_1\\life
expectancy.csv')

filtered_data = Socio_Economic_and_Life_expectancy.dropna(subset=['IncomeGroup', 'Communicable',


'NonCommunicable', 'Injuries'])
avg_diseases_and_injuries_by_income = filtered_data.groupby('IncomeGroup').agg({
'Communicable': 'mean',
'NonCommunicable': 'mean',
'Injuries': 'mean'
}).reset_index()

avg_diseases_and_injuries = pd.melt(avg_diseases_and_injuries_by_income, id_vars='IncomeGroup',


value_vars=['Communicable', 'NonCommunicable', 'Injuries'],
var_name='DiseaseType', value_name='AverageCount')

income_levels = ["Low income", "Lower middle income", "Upper middle income", "High income"]
avg_diseases_and_injuries['IncomeGroup'] = pd.Categorical(avg_diseases_and_injuries['IncomeGroup'],
categories=income_levels, ordered=True)

sns.set_theme(style="whitegrid")
palette = {
'Communicable': 'red',
'NonCommunicable': 'blue',
'Injuries': 'yellow'
}
g = sns.catplot(
data=avg_diseases_and_injuries,
x='IncomeGroup', y='AverageCount', hue='DiseaseType',
kind='bar', col='DiseaseType', col_wrap=3, sharey=False,
palette=palette
)

for ax in g.axes.flat:
for p in ax.patches:
if p.get_height() > 0:
ax.annotate(f'{p.get_height():.2f}',
(p.get_x() + p.get_width() / 2., p.get_height()),
ha='center', va='center', xytext=(0, 10), textcoords='offset points')

g.set_axis_labels("", "Average DALYs")


g.set_titles("{col_name}")
g.fig.subplots_adjust(top=0.9)
g.fig.suptitle("Average DALYs due to various factors")

for ax in g.axes.flat:
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, _: f'{int(x):,}'))

for ax in g.axes.flat:
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha='right')

g._legend.remove()
plt.show()
Average DALYs due to various factors

This plot illustrates the average amount of DALYs caused by various


factors such as communicable and non-communicable diseases, as
well as DALYs caused by injuries for different income groups across
various disease categories.

As seen from the plot, the DALYs caused by injuries aren’t as prevalent
as DALYs caused by diseases. The highest amount of healthy years lost
by Injuries averages at around 2200000 in the Lower-Middle Income
group.

DALYs caused by communicable diseases amount to the highest


amount of healthy years lost, with countries in the lower income
groups being affected the most. Low and Lower-middle income groups
have an average of 7740000 and 11800000 DALYs due to
communicable diseases respectively. However, lower-middle income
countries also face the highest amount of average DALYs lost due to
non-communicable diseases, at around 11186362.52

On the other hand, Communicable diseases have little presence in


Higher income groups, as shown by the small number of average
DALYs in Upper-middle income and High income groups. Upper-middle
income has around 1,600,000 average DALYs and High income has
only on average, 282000 DALYS. Conversely, non-communicable
diseases still affect higher income groups, with around 9160000
average DALYs for Upper-middle income and close to 4730000
average DALYs for High income.

Overall, the plot shows that Lower Income countries are affected by
both communicable and non-communicable diseases, whereas Higher
income countries have most DALYs only due to non-communicable
diseases, suggesting that these countries have a better quality of life
than poorer ones. Moreover, Injuries does not amount to DALYs as
much as diseases in any income groups.
The percentage of Income Groups
of different Regions
The percentage of Income Groups
of different Regions
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

file_path = 'C:\\Users\\HP\\Desktop\\VGU\\Python\\Project_1\\life expectancy.csv'


Socio_Economic_and_Life_expectancy = pd.read_csv(file_path)

region_income_counts = pd.crosstab(Socio_Economic_and_Life_expectancy['Region'],
Socio_Economic_and_Life_expectancy['IncomeGroup'])

region_income_dataframe = region_income_counts.reset_index().melt(id_vars='Region',
var_name='IncomeGroup', value_name='Count')

region_income_dataframe = region_income_dataframe[region_income_dataframe['Count'] != 0]
region_income_dataframe['Percentage'] = region_income_dataframe.groupby('Region')
['Count'].transform(lambda x: x / x.sum() * 100)
income_group_colors = {
'High income': 'red',
'Upper middle income': 'blue',
'Lower middle income': 'green',
'Low income': 'yellow'
}
sns.set(style="whitegrid")
g = sns.FacetGrid(region_income_dataframe, col="Region", col_wrap=2, sharex=False, sharey=False)

def pie_plot(data, **kwargs):


data = data.sort_values('IncomeGroup')
colors = [income_group_colors[group] for group in data['IncomeGroup']]
wedges, texts, autotexts = plt.pie(data['Percentage'], labels=data['IncomeGroup'], autopct='%1.0f%%',
colors=colors,
textprops={'color': "black"}, startangle=90)
plt.setp(autotexts, size=10)
plt.gca().set_aspect('equal')

g.map_dataframe(pie_plot)
g.set_titles("{col_name}")
g.fig.suptitle("The percentage of Income Groups of different Regions", y=1.05)

handles, labels = g.axes.flat[0].get_legend_handles_labels()


unique_labels = list(region_income_dataframe['IncomeGroup'].unique())
unique_handles = [plt.Line2D([0], [0], marker='o', color='w', markerfacecolor=income_group_colors[label],
markersize=10) for label in unique_labels]
g.fig.legend(unique_handles, unique_labels, title="Income Group", loc="center left", bbox_to_anchor=(1, 0.5))

plt.text(0.5, 0.95, "Income groups throughout different regions", horizontalalignment='center', fontsize=14,


transform=g.fig.transFigure)

plt.subplots_adjust(top=0.90, right=0.85)
plt.show()
The percentage of Income Groups
of different Regions

The pie chart above illustrates the relationship between different


income groups across various regions. Each pie chart represents a
specific region, with slices representing different income groups.
The size of each slice corresponds to the proportion of countries
within that income group relative to the total number of countries in
the region.

From the plot, we can see that regions such as East Asia & Pacific,
Europe & Central Asia, Latin America & Caribbean and Middle East
& North Africa have no Low income countries, with North America
having 100% of its countries in the High income group.

On the other hand, the majority of Sub-Saharan countries are in the


lower income group. 50% of countries in Sub-Saharan Africa are in
the Low income group and 34% of its countries in the Lower-middle
income one, with only 2% of countries in the High income one and
South Asia having no countries in the High income category and
75% South Asian countries are in the Lower-middle income group.

Overall, the plot shows that the plot shows that, for the majority of
regions, there are no presence of Low income group. At the same
time, North America is only consisted of High income countries. Low
income countries are most prevalent in Sub-Saharan African
countries and Lower-middle income countries in South Asia.
OCCURANCES OF CORRUPTION
OF DIFFERENT INCOME GROUPS
BY YEARS
OCCURANCES OF CORRUPTION
OF DIFFERENT INCOME GROUPS
BY YEARS
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Socio_Economic_and_Life_expectancy =
pd.read_csv('C:\\Users\\HP\\Desktop\\VGU\\Python\\Project_1\\life expectancy.csv')
Socio_Economic_and_Life_expectancy['Corruption'] =
Socio_Economic_and_Life_expectancy['Corruption'].fillna('N/A')

income_order = ["Low income", "Lower middle income", "Upper middle income", "High income"]
Socio_Economic_and_Life_expectancy['IncomeGroup'] =
pd.Categorical(Socio_Economic_and_Life_expectancy['IncomeGroup'],
categories=income_order, ordered=True)

g = sns.catplot(
data=Socio_Economic_and_Life_expectancy,
x='Corruption',
hue='IncomeGroup',
kind='count',
palette='viridis',
col='IncomeGroup',
col_wrap=2,
height=4,
aspect=1,
legend=False
)

for ax in g.axes.flatten():
for c in ax.containers:
labels = [f'{int(v.get_height())}' for v in c]
ax.bar_label(c, labels=labels, label_type='edge', padding=2, fontsize=10)

g.set_axis_labels("Corruption Rating", "Countries by Year Occurrences")


g.set_titles("{col_name}")
g.fig.suptitle("Instances of corruption of different Income Groups by years", y=0.99)

for ax in g.axes.flatten():
ax.tick_params(axis='x', rotation=45)
ax.set_xticks(range(len(Socio_Economic_and_Life_expectancy['Corruption'].unique())))
ax.set_xticklabels(Socio_Economic_and_Life_expectancy['Corruption'].unique(), rotation=45)

plt.show()
OCCURANCES OF CORRUPTION
OF DIFFERENT INCOME GROUPS
BY YEARS

The plot depicts the count of countries falling under a particular


Corruption Rating category, grouped by Income Group. The x-axis
represents the Corruption Rating, while the y-axis represents the
number of countries by year.

From the plot, we can observe that there is considerable variation


in corruption levels across different income groups. The most
noticeable point to be seen is that all 1083 High-income countries
have no corruption rating and Upper-middle income group only
has a few countries having a corruption rating.

On the other hand, Lower-middle Income has a large amount of


countries has corruption ratings from 1 to 4.5 with most being 188
countries having a 3 corruption rating. Low income countries also
have 96 and 93 having 2.5 and 3 corruption ratings respectively.

Overall, the plot shows that for the countries that are in the
higher income groups, there are low to no presence of corruption.
Lower income groups, however, are more prone to corruption of
various levels.
HEALTH AND EDUCATION EXPENDITURE OF
COUNTRIES IN DIFFERENT REGIONS
HEALTH AND EDUCATION EXPENDITURE OF
COUNTRIES IN DIFFERENT REGIONS

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
file_path = 'C:\\Users\\HP\\Desktop\\VGU\\Python\\Project_1\\life
expectancy.csv'
Socio_Economic_and_Life_expectancy = pd.read_csv(file_path)
filtered_data = Socio_Economic_and_Life_expectancy[['Health
Expenditure %', 'Education Expenditure %', 'Region']].dropna()
regions = filtered_data['Region'].unique()
palette = dict(zip(regions, sns.color_palette("tab10", len(regions))))
g = sns.FacetGrid(filtered_data, col="Region", col_wrap=3,
height=4, sharex=False, sharey=False)
g.map_dataframe(sns.scatterplot, x="Health Expenditure %",
y="Education Expenditure %", hue="Region", palette=palette,
legend=False)
g.set_titles(col_template="{col_name}")
plt.subplots_adjust(top=0.9)
g.fig.suptitle("Health and Education Expenditure of Countries in
Different Regions", fontsize=16)

plt.show()
HEALTH EXPENDITURE EXPENDITURE OF
COUNTRIES IN DIFFERENT REGIONS
This scatter plot visualizes the relationship between Health
Expenditure % and Education Expenditure % across different
regions. Each point corresponds to a specific country within a
region and indicates the values of health expenditure and
education expenditure as a percentage of GDP for that country.

From the plot above, we can see that most countries from
different regions spend the majority of 2 to 10% on Health and
Education Expenditure. These regions include Sub-Saharan Africa,
East Asia & Pacific, Europe & Central Asia and Latin America &
Caribbean. Regions such as South Asia and Middle East & North
Africa however have a more varying expenditure on Health and
Education expenditure.

North America spends more on Health Expenditure than other


regions, with all countries spending from 9% to around 17%. On the
other hand, East Asia & Pacific have more Education Expenditure,
with countries spending from 15% up to 20% of their GDP.

Overall, the plot shows that countries would prioritize spending on


Health rather than on Education. Moreover, most regions spend up
to around 10% of their GDP on Education, but some countries in
East Asia & Pacific spend from 10% to 20% of their GDP on
Education.
Prevalence of Undernourishment
across various income groups
Prevalence of Undernourishment
across various income groups
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Socio_Economic_and_Life_expectancy =
pd.read_csv('C:\\Users\\HP\\Desktop\\VGU\\Python\\Project_1\\life expectancy.csv')

filtered_data = Socio_Economic_and_Life_expectancy.dropna(subset=['Prevelance of
Undernourishment'])

income_group_order = filtered_data['IncomeGroup'].unique()

custom_palette = {"Low income": "red", "Lower middle income": "blue", "Upper middle
income": "yellow", "High income": "green"}

g = sns.FacetGrid(filtered_data, col="IncomeGroup", hue="IncomeGroup", aspect=1.5,


height=3, palette=custom_palette, col_order=income_group_order, legend_out=True)

g.map(sns.kdeplot, "Prevelance of Undernourishment", fill=True, alpha=0.6, bw_adjust=0.5)

for ax in g.axes.flat:
ax.set_xlabel('Prevalence of Undernourishment (%)')
ax.set_xlim(0, 60)
ax.set_title('')

g.set_axis_labels("", "")
g.fig.suptitle("Prevalence of Undernourishment across various income groups", y=0.99)

g.add_legend(title='Income Group')

plt.text(30, -10, 'Ridgeline Plot', fontsize=14, ha='center')

plt.show()
Prevalence of Undernourishment
across various income groups

The ridgeline plot visualizes the distribution of prevalence of


undernourishment across different income groups.

Examining the plot reveals that higher income groups have


less prevalence of undernourishment than lower income
groups. High income group has almost only about 2% of
malnourishment and Upper-middle income have about 3%
with very small ridges spanning from 16% to 25%.

On the other hand, Lower-middle income group have a


moderate ridgeline line spanning to 26%. Low income group
are more severe for most cases with ridges spanning from
10% to 40% and there are even tiny ridges at 70%.

In conclusion, the general trend shown in this plot is that the


higher income groups would have less prevalence of
malnourishment.
Average Education Expenditure vs
Average Unemployment Rate by Region
Average Education Expenditure vs
Average Unemployment Rate by Region
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Socio_Economic_and_Life_expectancy =
pd.read_csv(r"C:\Users\Admin\OneDrive\Documents\Python\life expectancy.csv")
filtered_data = Socio_Economic_and_Life_expectancy[['Year', 'Unemployment',
'Education Expenditure %', 'Region']].dropna()

avg_data = filtered_data.groupby(['Year', 'Region']).agg(


avg_unemployment=('Unemployment', 'mean'),
avg_education_expenditure=('Education Expenditure %', 'mean')
).reset_index()

palette = sns.color_palette('tab10', avg_data['Region'].nunique())


region_colors = dict(zip(avg_data['Region'].unique(), palette))

def scatter_with_color(data, **kwargs):


region = data['Region'].iloc[0]
color = region_colors[region]
sns.scatterplot(data=data, x='avg_education_expenditure',
y='avg_unemployment', color=color)
sns.regplot(data=data, x='avg_education_expenditure', y='avg_unemployment',
scatter=False, ci=None, color=color, line_kws={"lw": 1.2})

g = sns.FacetGrid(avg_data, col="Region", col_wrap=4, sharex=False, sharey=False)


g.map_dataframe(scatter_with_color)

g.set_axis_labels("Average Education Expenditure (%)", "Average Unemployment


Rate (%)")
g.set_titles("{col_name}")

plt.subplots_adjust(top=0.92)
g.fig.suptitle('Average Education Expenditure vs Average Unemployment Rate by
Region', fontsize=16)
plt.show()
Average Education Expenditure vs
Average Unemployment Rate by Region
The plot illustrates how the average unemployment rate relates to the
average education expenditure percentage across different regions over
several years. Each point represents a specific year within a region.

As we observe the plot, we can see that there are different positive or
negative relationships between average education expenditure and average
unemployment for different regions. With East Asia & Pacific displaying
almost no correlation between unemployment and education.

For Sub-Saharan African countries, the general trend is that the year that
countries spent more on education is also the year with the lesser amount
of unemployment. Some years have an average of Education Expenditure
ranging from 3.25% to 4.25% have unemployment from 9 to nearly 11%. But
for the most part, for the years that spent the same amount of education
expenditure have only around 6 to 8% of unemployment. The same trend
also happens for East South Asia and Latin America & Caribbean.

Countries in Europe & Central Asia, Middle East & North Africa and North
America, however, have a positive correlation between average education
and average unemployment. With the most notable one, being Middle East
& North Africa. For the years that have a recording of average education
expenditure ranging from 4.25 to 4.75%. They have an unemployment rate
hovering around 6 to 7%. However, the years that spend more than 4.75%
also seem to have a higher unemployment rate, with an instance of 6% in
Education expenditure but also 11% in the unemployment rate.

Countries in East Asia & Pacific, however, have almost no correlation


between unemployment and education expenditure.

Overall, the plot shows that, unemployment and education expenditure


have only a little impact of the improvement of unemployment rates in
most of the regions and that there are other factors that play a part in
affecting unemployment.
AVERAGE CO2 EMISSIONS OF DIFFERENT
REGIONS FROM 2001 TO 2019
AVERAGE CO2 EMISSIONS OF DIFFERENT
REGIONS FROM 2001 TO 2019
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

Socio_Economic_and_Life_expectancy =
pd.read_csv(r"C:\Users\Admin\OneDrive\Documents\Python\life expectancy.csv")
filtered_data = Socio_Economic_and_Life_expectancy[['Year', 'CO2', 'Region']].dropna()

avg_co2 = filtered_data.groupby(['Year', 'Region']).agg(


avg_CO2=('CO2', 'mean')
).reset_index()

palette = sns.color_palette('tab10', avg_co2['Region'].nunique())


region_colors = dict(zip(avg_co2['Region'].unique(), palette))

def lineplot_with_color(data, **kwargs):


region = data['Region'].iloc[0]
color = region_colors[region]
sns.lineplot(data=data, x='Year', y='avg_CO2', color=color)

g = sns.FacetGrid(avg_co2, col="Region", col_wrap=4, sharey=False, height=4)


g.map_dataframe(lineplot_with_color)

g.set_axis_labels("Year", "Average CO2 (Kilotons)")


g.set_titles("{col_name}")
g.fig.suptitle("Average CO2 Emissions of Different Regions from 2001 to 2019", fontsize=16)

for ax in g.axes.flat:
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, _: f'{int(x):,}'))

plt.subplots_adjust(left=0.1, top=0.92)
plt.show()
AVERAGE CO2 EMISSIONS OF DIFFERENT
REGIONS FROM 2001 TO 2019
The plot visualizes the average levels of CO2 emissions over time
across different regions. Each line represents the average CO2
levels for a specific region.

Looking at the graph, we can see that almost all regions display a
steep increase in CO2 emissions every year, with only North
America and Europe & Central Asia showing a downward trend.

East Asia & Pacific have the most amount of CO2 emissions of all
regions. It was reaching up to 650,000 Kilotons in 2019. On the
other hand, Sub-Saharan Africa has the lowest amount of CO2
emissions, even on the upward trend, it reaches only about 18,500
kilotons.

North America and Europe & Central Asia are different from other
regions because their CO2 emission is decreasing. North America
had a CO2 level reported in 2015 at around 3,100,000 kilotons but
at the end of 2019, the number is now at 2,700,000 kilotons. The
same goes for Europe & Central Asia with their reported CO2
emission at 105,000 and ending in 2019 at around 87,000 kilotons.

In conclusion, the graph shows that due to economic reasons,


almost all countries have an increase in CO2 emissions from 2001
to 2019. Only North America and Europe & Central Asia is showing
a downward trend.
Distribution of average Unemployment of
different years in different regions
Distribution of average Unemployment of
different years in different regions

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Socio_Economic_and_Life_expectancy =
library(ggplot2)
pd.read_csv('C:\\Users\\huydo\\OneDrive\\Desktop\\Python\\life expectancy.csv')
library(dplyr)
filtered_data = Socio_Economic_and_Life_expectancy[['Year', 'Unemployment',
filtered_data <-
'Region']].dropna()
avg_unemployment = filtered_data.groupby(['Year', 'Region'],
na.omit(Socio_Economic_and_Life_expectancy[c("Sanitation",
as_index=False).agg({'Unemployment': 'mean'}).rename(columns={'Unemployment':
"Region", "IncomeGroup")])
'avg_unemployment'})

ggplot(filtered_data,
plt.figure(figsize=(15, 10)) aes(x = Region, y = Sanitation, fill =
gIncomeGroup))
= sns.FacetGrid(avg_unemployment,
+ col="Region", col_wrap=4, sharey=False)
g.map_dataframe(sns.histplot, x='avg_unemployment', binwidth=0.5, kde=False,
geom_boxplot() +
alpha=0.7)
labs(y = "Sanitation %", fill = "Income Group") +
g.set_titles("{col_name}")
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1), axis.title.x =
plt.subplots_adjust(top=0.9)
g.fig.suptitle("Distribution of average unemployment of different years in different
element_blank())+
regions", fontsize=16)
scale_fill_viridis_d(option = "C")
num_regions = avg_unemployment['Region'].nunique()
colors = sns.color_palette("viridis", num_regions)

for ax, color in zip(g.axes.flatten(), colors):


for patch in ax.patches:
patch.set_facecolor(color)

for ax in g.axes.flatten():
ax.set_xlabel('')
ax.set_ylabel('')
g.fig.text(0.5, 0.02, 'Average Unemployment Rate', ha='center', fontsize=12)
g.fig.text(0.02, 0.5, 'Number of Years', va='center', rotation='vertical', fontsize=12)
plt.show()
Distribution of average Unemployment of
different years in different regions

The histogram visualizes the distribution of average


unemployment rates across different regions over a specified
time frame. Each facet represents a distinct region, allowing for
a comparative analysis of unemployment trends. The x-axis
depicts the average unemployment rate, while the y-axis
indicates the frequency of occurrence within each bin.

Upon examination, we can see that the East Asia & Pacific
region has the lowest average rate of unemployment, with the
majority of the countries having an average of 2.6 to nearly 4%
of their labor forces. Alongside the East Asia & Pacific region,
South Asia also exhibit a low average unemployment rate with a
concentration of around 5%.

On the other hand, Sub-Saharan Africa, while not having the


highest unemployment rate, the region have most of its
countries having an average from 7.6% to 8.75%. Whereas
Europe & Central Asia have an average unemployment rate
ranging from around 6.1% to the highest average
unemployment rate of any region, nearly 11.5%.

Overall, the plot shows that Sub-Saharan countries have a


critical average rate of unemployment and Europe and Central
Asia have the highest instance of average rate of
unemployment.
Average Sanitation across different
regions and income groups
Average Sanitation across different
regions and income groups
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Socio_Economic_and_Life_expectancy =
pd.read_csv('C:\\Users\\huydo\\OneDrive\\Desktop\\Python\\life
expectancy.csv')

filtered_data = Socio_Economic_and_Life_expectancy[['Sanitation', 'Region',


'IncomeGroup']].dropna()

income_groups = filtered_data['IncomeGroup'].unique()
colors = ['red', 'blue', 'green', 'yellow']
if len(income_groups) > len(colors):
raise ValueError("Not enough colors defined for the number of unique
IncomeGroups")
palette_dict = dict(zip(income_groups, colors[:len(income_groups)]))
plt.figure(figsize=(12, 8))
sns.boxplot(data=filtered_data, x='Region', y='Sanitation', hue='IncomeGroup',
palette=palette_dict)
plt.xlabel('')
plt.ylabel('Sanitation %')
plt.title('Average Sanitation across different regions and Income Groups')
plt.xticks(rotation=45, ha='right')

plt.legend(title='Income Group')
sns.set_theme(style="whitegrid")

plt.tight_layout()
plt.show()
Average Sanitation across different
regions and income groups
The provided plot visualizes the relationship between average
sanitation levels, regions, and income groups. The x-axis denotes
various regions, while the y-axis represents the average percentage
of the population of the corresponding regions that has access to
safe sanitation services.

Beginning with the broad overview, it is clear that the High income
countries are the ones that remain the most sanitized.

Upon closer examination, almost all High income countries in


various regions have above 75% of the population using safe
sanitation services. North American countries have the highest
range of average percentage of the population having good
sanitation, from 87.5% to nearly 100%. However, unlike other regions,
all High Income countries in Latin America & Caribbean have an
average sanitation value ranging from around 31% to only 62.5%.

On the other hand, Lower income groups have the average


population with sanitation ranging from 18.75% to 62.5%. Notably,
Sub-Saharan Africa has an even lower average range, from around
13% to slightly above 25%. Low income countries in Africa have the
lowest range of average sanitation, from 7% to around 17%.

In conclusion, the plot shows that the higher income group will have
more of its population having access to clean sanitation services. It
also highlights the severe lack of sanitation in countries throughout
Sub-Saharan Africa.

You might also like