0% found this document useful (0 votes)

80 views25 pages

Srivardhan Python

The document summarizes data analysis on Welsh employment statistics from 2008-2019: 1. Public administration employed the most workers over the period while real estate employed the fewest. 2. Real estate saw the highest employment growth rate at 86.7% while retail saw the smallest at 0.64%. 3. 2018 had the highest overall employment of 1.45 million, while 2010 had the lowest employment of 1.33 million.

Uploaded by

Pallavi Pallu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views25 pages

Srivardhan Python

Uploaded by

Pallavi Pallu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

5/22/2020 srivardhan python

Programming for data analysis

Name: katakam srivardhan hruday kuamr

student id: st20166815

Moodle code: CIS7031_S2_19

Moodle leader: Imitiaz Khan

1. Data Preparation
1.1 Downloaded dataset for the period 2008 to 2019 from stat wales data source.

1.2 Data has been processed and we found that there is no outlier or null vales in the dataset.

1.3 Dataset has changes the name of the industry as aforementioned in assignment.

In [1]:

import pandas as pd
import numpy as np
%matplotlib inline

In [2]:

base_data=pd.read_excel('C:\\Users\\Admin\\Downloads\\Python\\Dataset.xlsx')

Below is the final dataframe, shows wales total employment values.

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 1/25

5/22/2020 srivardhan python

In [3]:

base_data.head(15)

Out[3]:

Industry 2009 2010 2011 2012 2013 2014 2015 2016

0 Agriculture 37700 38200 36100 36100 36800 42700 40700 43200

1 Production 156700 149800 158600 154400 164200 173300 172300 162500 1

2 Construction 96600 93200 90000 91300 89300 97000 92600 102700

3 Retail 345400 344500 343100 347300 345100 337300 357700 360200 3

4 ICT 27800 27900 26400 27200 26900 35700 24000 34400

5 Finance 33800 29800 33200 31100 32400 32400 30800 31000

6 Real_Estate 13500 14600 17600 18800 18000 22200 19100 22700

7 Professional_Service 144800 145800 143600 137300 149900 152900 166200 161200 1

8 Public_Adminstration 415600 418600 425600 421000 427000 427600 423200 418500 4

9 Other_Service 64200 68000 72400 72800 75500 73300 77200 72400

In [4]:

base_data.index=base_data['Industry']
base_data.head()

Out[4]:

Industry 2009 2010 2011 2012 2013 2014 2015 2016

Industry

Agriculture Agriculture 37700 38200 36100 36100 36800 42700 40700 43200

Production Production 156700 149800 158600 154400 164200 173300 172300 162500

Construction Construction 96600 93200 90000 91300 89300 97000 92600 102700

Retail Retail 345400 344500 343100 347300 345100 337300 357700 360200

ICT ICT 27800 27900 26400 27200 26900 35700 24000 34400

In [5]:

del base_data['Industry']

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 2/25

5/22/2020 srivardhan python

In [6]:

base_data.head()

Out[6]:

2009 2010 2011 2012 2013 2014 2015 2016 2017 2

Industry

Agriculture 37700 38200 36100 36100 36800 42700 40700 43200 40200 41

Production 156700 149800 158600 154400 164200 173300 172300 162500 165100 165

Construction 96600 93200 90000 91300 89300 97000 92600 102700 90800 101

Retail 345400 344500 343100 347300 345100 337300 357700 360200 333500 347

ICT 27800 27900 26400 27200 26900 35700 24000 34400 58900 31

In [7]:

base_data['Total_Employees']=base_data.sum(axis=1)

In [8]:

base_data['Total_Employees_Growth']=round(((base_data[2018]/base_data[2009])-1)*100,2)

In [9]:

base_data.head(10)

Out[9]:

2009 2010 2011 2012 2013 2014 2015 2016 20

Industry

Agriculture 37700 38200 36100 36100 36800 42700 40700 43200 402

Production 156700 149800 158600 154400 164200 173300 172300 162500 1651

Construction 96600 93200 90000 91300 89300 97000 92600 102700 908

Retail 345400 344500 343100 347300 345100 337300 357700 360200 3335

ICT 27800 27900 26400 27200 26900 35700 24000 34400 589

Finance 33800 29800 33200 31100 32400 32400 30800 31000 321

Real_Estate 13500 14600 17600 18800 18000 22200 19100 22700 182

Professional_Service 144800 145800 143600 137300 149900 152900 166200 161200 1764

Public_Adminstration 415600 418600 425600 421000 427000 427600 423200 418500 4245

Other_Service 64200 68000 72400 72800 75500 73300 77200 72400 832

2. Data Analysis
localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 3/25
5/22/2020 srivardhan python

In [10]:

base_data['Total_Employees'].min()
Out[10]:

189900

In [11]:

base_data[base_data['Total_Employees']==base_data['Total_Employees'].min()].index

Out[11]:

Index(['Real_Estate'], dtype='object', name='Industry')

In [12]:

base_data['Total_Employees'].max()

Out[12]:

4236500

In [13]:

base_data[base_data['Total_Employees']==base_data['Total_Employees'].max()].index

Out[13]:

Index(['Public_Adminstration'], dtype='object', name='Industry')

In [14]:

import plotly.express as px

2.1 Which industry employed highest and lowest workers over the period

We have fetch data of workplace employment by industry and area (Wales) for 12 years i.e. from 2008 to
2019. Below is the visualisation using python plotly express. It can observed that public administration has
highest number of employments over the period while real estate employee least number of the employee in
time span from 2008 to 2019.

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 4/25

5/22/2020 srivardhan python

In [15]:

fig = px.bar(base_data, y="Total_Employees", x=base_data.index, color=base_data.index,t

ext='Total_Employees')
fig.update_layout(title_text='Industry Employee Numbers')

fig.show()

Industry Employee Numbers

3.5M

3461700
3M
Total_Employees

2.5M

1.5M
162

2.2 Which industry has the highest and lowest overall growth over the period?

The below visualization shows industry percentage growth of employment over the period. It can be
observed that real estate shows highest percentage i.e. 86% growth in the employment from 2008 to 2019
while retail shows least percentage employee growth i.e. 0.64%.

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 5/25

5/22/2020 srivardhan python

In [16]:

fig = px.bar(base_data, y="Total_Employees_Growth", x=base_data.index, color=base_data.

index,text='Total_Employees_Growth')
fig.update_layout(title_text='Industry Employee % Growth')
fig.show()

Industry Employee % Growth

90
86.67
80

70
tal_Employees_Growth

2.3 Which years are the best and worst performing year in relation to number of employments. (highest and
lowest employment)

Bar graph visualisation shows number of performing years in relations to employment. It shows that 2018 is
the best performing year with highest employment with 1.45 million whiles 2010 is worst performing year with
least number of employments with 1.33 million.

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 6/25

5/22/2020 srivardhan python

In [17]:

base_data2=base_data.T
base_data2.head()

Out[17]:

Industry Agriculture Production Construction Retail ICT Finance Real_Estate Pro

2009 37700.0 156700.0 96600.0 345400.0 27800.0 33800.0 13500.0

2010 38200.0 149800.0 93200.0 344500.0 27900.0 29800.0 14600.0

2011 36100.0 158600.0 90000.0 343100.0 26400.0 33200.0 17600.0

2012 36100.0 154400.0 91300.0 347300.0 27200.0 31100.0 18800.0

2013 36800.0 164200.0 89300.0 345100.0 26900.0 32400.0 18000.0

In [18]:

base_data2.head()

Out[18]:

Industry Agriculture Production Construction Retail ICT Finance Real_Estate Pro

2009 37700.0 156700.0 96600.0 345400.0 27800.0 33800.0 13500.0

2010 38200.0 149800.0 93200.0 344500.0 27900.0 29800.0 14600.0

2011 36100.0 158600.0 90000.0 343100.0 26400.0 33200.0 17600.0

2012 36100.0 154400.0 91300.0 347300.0 27200.0 31100.0 18800.0

2013 36800.0 164200.0 89300.0 345100.0 26900.0 32400.0 18000.0

In [19]:

base_data2['Yearly_Total_Employees']=base_data2.sum(axis=1)

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 7/25

5/22/2020 srivardhan python

In [20]:

base_data2.head(10)

Out[20]:

Industry Agriculture Production Construction Retail ICT Finance Real_Estate Pro

2009 37700.0 156700.0 96600.0 345400.0 27800.0 33800.0 13500.0

2010 38200.0 149800.0 93200.0 344500.0 27900.0 29800.0 14600.0

2011 36100.0 158600.0 90000.0 343100.0 26400.0 33200.0 17600.0

2012 36100.0 154400.0 91300.0 347300.0 27200.0 31100.0 18800.0

2013 36800.0 164200.0 89300.0 345100.0 26900.0 32400.0 18000.0

2014 42700.0 173300.0 97000.0 337300.0 35700.0 32400.0 22200.0

2015 40700.0 172300.0 92600.0 357700.0 24000.0 30800.0 19100.0

2016 43200.0 162500.0 102700.0 360200.0 34400.0 31000.0 22700.0

2017 40200.0 165100.0 90800.0 333500.0 58900.0 32100.0 18200.0

2018 41100.0 165700.0 101800.0 347600.0 31500.0 35500.0 25200.0

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 8/25

5/22/2020 srivardhan python

In [21]:

fig=px.bar(base_data2,x=base_data2.index,y="Yearly_Total_Employees")
fig.update_layout(title='Yearly Total Employee',legend=dict(x=0,y=0.5))
fig.show()

Yearly Total Employee

14M

12M
rly_Total_Employees

10M

3 Visual analysis

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 9/25

5/22/2020 srivardhan python

In [22]:

base_data3=pd.read_excel('C:\\Users\\Admin\\Downloads\\Python\\Dataset.xlsx')
base_data3.index=base_data3['Industry']
base_data3.head()

Out[22]:

Industry 2009 2010 2011 2012 2013 2014 2015 2016

Industry

Agriculture Agriculture 37700 38200 36100 36100 36800 42700 40700 43200

Production Production 156700 149800 158600 154400 164200 173300 172300 162500

Construction Construction 96600 93200 90000 91300 89300 97000 92600 102700

Retail Retail 345400 344500 343100 347300 345100 337300 357700 360200

ICT ICT 27800 27900 26400 27200 26900 35700 24000 34400

3.1 Create a dynamic scatter/bubble plot showing the change of workforce number over the period using
plotly Express.

To plot scatter chart, first we have to convert dataframe into columns, below is syntax to convert data frame
into columns.

In [23]:

del base_data3['Industry']

In [24]:

base_data4=base_data3.T

In [25]:

Final_df=pd.DataFrame(columns=['Year','Workforce','Industry','Workforce_Change'])
for col in base_data4.columns:
if col!='Yearly_Total_Employees':
#print(col)
final_data=pd.DataFrame(columns=['Year','Workforce','Industry','Workforce_Chang
e'])
final_data['Workforce']=base_data4[col].tolist()
final_data['Industry']=col
final_data['Year']=base_data4[col].index
final_data['Workforce_Change']= final_data['Workforce'] - final_data['Workforc
e'].shift()
final_data=final_data.fillna(0)
Final_df=Final_df.append(final_data)

Final output of the Data Frame to plot scatter chart.

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 10/25

5/22/2020 srivardhan python

In [26]:

Final_df.head(100)

Out[26]:

Year Workforce Industry Workforce_Change

0 2009 37700 Agriculture 0.0

1 2010 38200 Agriculture 500.0

2 2011 36100 Agriculture -2100.0

3 2012 36100 Agriculture 0.0

4 2013 36800 Agriculture 700.0

... ... ... ... ...

5 2014 73300 Other_Service -2200.0

6 2015 77200 Other_Service 3900.0

7 2016 72400 Other_Service -4800.0

8 2017 83200 Other_Service 10800.0

9 2018 81800 Other_Service -1400.0

100 rows × 4 columns

Below dynamic scatter plot visualization shows the change of workforce number over the period. It can be
observed that in year 2017 ICT industry shows highest number of increases in workforce change followed by
retail industry with workforce employee to 20.4k in 2015 while same industry(ICT) shows highest number of
decreases in workforce in 2018 followed by retail in 2017. So it can be concluded that retail and ICT shows a
greater number of workforce changes over time period.

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 11/25

5/22/2020 srivardhan python

In [27]:

fig = px.scatter(Final_df, x="Year", y="Workforce_Change", color="Industry",

log_x=True, size_max=60)
fig.update_layout(title='Scatter plot of change in workforce')
fig.show()

Scatter plot of change in workforce

20k

10k
Workforce_Change

10k

4. PCA/Correlation
PCA is basically dimensionality reduction method that is used to reduce the dimensions of the dataset into
smaller set of the variables. Using below syntax in python we have drawn PCA = 2 (Principle Component
Analysis).

In [28]:

from sklearn.decomposition import PCA

pca = PCA()
PCA_base=base_data3[[2009,2010,2011,2012,2013,2014,2015,2016,2017,2018]]

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 12/25

5/22/2020 srivardhan python

In [29]:

PCA_base.head()

Out[29]:

2009 2010 2011 2012 2013 2014 2015 2016 2017 2

Industry

Agriculture 37700 38200 36100 36100 36800 42700 40700 43200 40200 41

Production 156700 149800 158600 154400 164200 173300 172300 162500 165100 165

Construction 96600 93200 90000 91300 89300 97000 92600 102700 90800 101

Retail 345400 344500 343100 347300 345100 337300 357700 360200 333500 347

ICT 27800 27900 26400 27200 26900 35700 24000 34400 58900 31

In [30]:

pca.n_components = 2
X_reduced = pca.fit_transform(PCA_base)
df_X_reduced = pd.DataFrame(X_reduced,columns=['PC1','PC2'], index=PCA_base.index)
df_X_reduced=round(df_X_reduced,2)

In [31]:

df_X_reduced.head(10)

Out[31]:

PC1 PC2

Industry

Agriculture -312091.23 -8151.22

Production 76819.90 912.47

Construction -137358.52 -8786.80

Retail 658534.20 -15872.69

ICT -335201.17 10284.00

Finance -334432.79 -10293.86

Real_Estate -376204.46 -7409.96

Professional_Service 58629.49 33479.53

Public_Adminstration 903354.93 2773.17

Other_Service -202050.35 3065.36

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 13/25

5/22/2020 srivardhan python

In [32]:

corr = df_X_reduced.T.corr()
corr.style.background_gradient(cmap='coolwarm')

Out[32]:

Industry Agriculture Production Construction Retail ICT Finance Real_Estate

Industry

Agriculture 1 -1 1 -1 1 1

Production -1 1 -1 1 -1 -1 -

Construction 1 -1 1 -1 1 1

Retail -1 1 -1 1 -1 -1 -

ICT 1 -1 1 -1 1 1

Finance 1 -1 1 -1 1 1

Real_Estate 1 -1 1 -1 1 1

Professional_Service -1 1 -1 1 -1 -1 -

Public_Adminstration -1 1 -1 1 -1 -1 -

Other_Service 1 -1 1 -1 1 1

Real estate, Finance, agriculture, and ICT have large negative loading on principle component 2. This
component focuses on wales more unemployed workforce. While, production, public admin service, retail
and professional service have positive loading on component 1. This component have focuses on industry
have more workforce.

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 14/25

5/22/2020 srivardhan python

In [33]:

fig = px.scatter(df_X_reduced, x='PC1', y='PC2',color=df_X_reduced.index,hover_name=df_

X_reduced.index)
fig.update_layout(title='Principle Component Analysis Scatterplot')
fig.show()

Principle Component Analysis Scatterplot

30k

20k

10k
PC2

4.2 Correlation for each industry over years

Below correlation matrix shows correlation for each industry from 2009 to 2018. It can be observe that
agriculture industry is highly correlated with construction industry and other services is also positively
corelated with professional service and public administration. whereas retail and ICT shows weak linear
relationship.

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 15/25

5/22/2020 srivardhan python

In [34]:

PCA_base.head()

Out[34]:

2009 2010 2011 2012 2013 2014 2015 2016 2017 2

Industry

Agriculture 37700 38200 36100 36100 36800 42700 40700 43200 40200 41

Production 156700 149800 158600 154400 164200 173300 172300 162500 165100 165

Construction 96600 93200 90000 91300 89300 97000 92600 102700 90800 101

Retail 345400 344500 343100 347300 345100 337300 357700 360200 333500 347

ICT 27800 27900 26400 27200 26900 35700 24000 34400 58900 31

In [35]:

import matplotlib.pyplot as plt

corr = round(PCA_base.T.corr(),3)
corr.style.background_gradient(cmap='coolwarm')

Out[35]:

Industry Agriculture Production Construction Retail ICT Finance Real_Es

Industry

Agriculture 1 0.647 0.727 0.228 0.378 -0.005 0

Production 0.647 1 0.188 0.028 0.232 0.225 0

Construction 0.727 0.188 1 0.414 0.01 0.309 0

Retail 0.228 0.028 0.414 1 -0.552 -0.253 0

ICT 0.378 0.232 0.01 -0.552 1 0.043 0

Finance -0.005 0.225 0.309 -0.253 0.043 1 0

Real_Estate 0.668 0.604 0.598 0.232 0.154 0.316

Professional_Service 0.637 0.56 0.441 0.046 0.503 0.389 0

Public_Adminstration 0.195 0.547 0.08 -0.258 0.122 0.59 0

Other_Service 0.333 0.578 -0.031 -0.156 0.543 0.242 0

5. Clustering (k means & hierarchical)

5.1 Using the best and worst performing year column’s employment data (2.3) undertake a K means
clustering analysis (K=2 & 3) and identify industries cluster together.

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 16/25

5/22/2020 srivardhan python

Below is K-means Clustering Table. K_2 is 2 means clustering K_3 is 3 means clustering

In [36]:

cluster_base=base_data3[[2010,2018]]

In [37]:

cluster_base.head()

Out[37]:

2010 2018

Industry

Agriculture 38200 41100

Production 149800 165700

Construction 93200 101800

Retail 344500 347600

ICT 27900 31500

In [38]:

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans
cluster = KMeans(n_clusters=2)
predicted_2 = cluster.fit_predict(cluster_base)

cluster2 = KMeans(n_clusters=3)
predicted_3 = cluster2.fit_predict(cluster_base)

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 17/25

5/22/2020 srivardhan python

In [39]:

cluster_base['K_2']=predicted_2+1
cluster_base['K_3']=predicted_3+1
cluster_base['K_2']=cluster_base['K_2'].astype(str)
cluster_base['K_3']=cluster_base['K_3'].astype(str)

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: Settin
gWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-doc

s/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: Settin
gWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-doc

s/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: Settin
gWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-doc

s/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:4: Settin
gWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-doc

s/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 18/25

5/22/2020 srivardhan python

In [40]:

cluster_base.head(10)

Out[40]:

2010 2018 K_2 K_3

Industry

Agriculture 38200 41100 1 2

Production 149800 165700 1 1

Construction 93200 101800 1 2

Retail 344500 347600 2 3

ICT 27900 31500 1 2

Finance 29800 35500 1 2

Real_Estate 14600 25200 1 2

Professional_Service 145800 187100 1 1

Public_Adminstration 418600 434900 2 3

Other_Service 68000 81800 1 2

Scatter plot of K means clustering with K=2. From below two cluster visualization it can be observed that
more industry are in cluster 2 while cluster 1 has only 2 industry with certain similarities.

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 19/25

5/22/2020 srivardhan python

In [41]:

fig = px.scatter(cluster_base, x=2010, y=2018, color="K_2",hover_name=cluster_base.inde

x)
fig.update_layout(title='Scatter plot of K means clustering with K=2')
fig.show()

Scatter plot of K means clustering with K=2

450k

400k

350k

300k

250k
2018

200k

Scatter plot of K means clustering with K=3. From below k = 3 cluster visualization it can be observed
that more industry are in cluster 3 while cluster 1 and cluster 2 has 2 industry with certain similarities.

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 20/25

5/22/2020 srivardhan python

In [42]:

fig = px.scatter(cluster_base, x=2010, y=2018, color="K_3",hover_name=cluster_base.inde

x)
fig.update_layout(title='Scatter plot of K means clustering with K=3')
fig.show()

Scatter plot of K means clustering with K=3

450k

400k

350k

300k

250k
2018

200k

5.2 Hierarchical cluster

Dendrogram is used to determine the number of appropriate clusters in hierarchical clustering. It is the main
output of hierarchical clustering. The horizontal axis of dendrogram represent distances between cluster. The
number of clusters is equal to distance between two straight line drawn from one cluster to another. This is
refer to as Euclidean distance. So from above diagram using this clustering we have identified 6 clusters.

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 21/25

5/22/2020 srivardhan python

In [43]:

import scipy.cluster.hierarchy as sch

#Lets create a dendrogram variable linkage is actually the algorithm #itself of hierarc
hical clustering and then in linkage we have to #specify on which data we apply and eng
age. This is X dataset
dendrogram = sch.dendrogram(sch.linkage(base_data3[[2010,2018]], method = "ward"))
plt.title('Dendrogram')
plt.xlabel('Years')
plt.ylabel('Euclidean distances')
plt.show()

In [ ]:

In [44]:

from sklearn.cluster import AgglomerativeClustering

hc = AgglomerativeClustering(n_clusters = 6, affinity = 'euclidean', linkage ='ward')
# Lets try to fit the hierarchical clustering algorithm to dataset #X while creating t
he clusters vector that tells for each customer #which cluster the customer belongs to.
y_hc=hc.fit_predict(base_data3[[2010,2018]])

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 22/25

5/22/2020 srivardhan python

In [45]:

cluster_base['Hierarchical_clustering']=y_hc
cluster_base['Hierarchical_clustering']=cluster_base['Hierarchical_clustering']+1
cluster_base['Hierarchical_clustering']=cluster_base['Hierarchical_clustering'].astype(
str)

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: Settin
gWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-doc

s/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: Settin
gWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-doc

s/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: Settin
gWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-doc

s/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Below Scatter plot is created using the k = 6 cluster.

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 23/25

5/22/2020 srivardhan python

In [46]:

fig = px.scatter(cluster_base, x=2010, y=2018, color="Hierarchical_clustering",hover_na

me=cluster_base.index)
fig.update_layout(title='Scatter plot of Hierarchical clustering with K=6')
fig.show()

Scatter plot of Hierarchical clustering with K=6

450k

400k

350k

300k

250k
2018

200k

k-means cluster is formed with predetermine number of clusters. In this we have identify the industry cluster
of best and worst performing year of employment with k = 2 and k = 3 cluster while in hierarchical clustering
as name suggest built hierarchy of cluster and result of number of clusters are reproduced as k =6 industry
cluster for best and worst performing year of employment.

6. Discussion
Provide a brief discussion (~ 300 words) on employment landscape of Wales based on the employment data
analysis results.

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 24/25

5/22/2020 srivardhan python

From the report it can be observed that employment in wales shows highest workforce in public
administration services followed by retail, production, and professional services while least work force is in
real estate, but it shows highest growth percentage in employment from 2008 to 2019. Though retail work
force is second highest, but this industry has lowest percentage growth rate over a period. With year wise
total wales employment, 2018 shows highest employment with real estate showing 38% growth while ICT
shows negative % growth. In 2010, wales shows least total workforce, with average negative (-2%) growth.
From correlation matrix, it can be observe that agriculture industry is highly correlated with construction
industry and other services is also positively corelated with professional service and public administration.
whereas retail and ICT shows weak linear relationship.

In [ ]:

localhost:8888/nbconvert/html/srivardhan python.ipynb?download=false 25/25

Data Analytics Using Python
100% (1)
Data Analytics Using Python
8 pages
Ga Irrsp Study Guide
100% (3)
Ga Irrsp Study Guide
7 pages
AWS RDS User Guide PDF
100% (1)
AWS RDS User Guide PDF
759 pages
SACE Stage 2 Chemistry - Cation Exchange Capacity Deconstruct & Design
No ratings yet
SACE Stage 2 Chemistry - Cation Exchange Capacity Deconstruct & Design
7 pages
EM2301. Practical Class 7
No ratings yet
EM2301. Practical Class 7
5 pages
HACKATHON
No ratings yet
HACKATHON
8 pages
PFDA
No ratings yet
PFDA
23 pages
Business Intelligence and Analytics
No ratings yet
Business Intelligence and Analytics
8 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
PySpark Slides
No ratings yet
PySpark Slides
30 pages
Self Made Project File by Rahul Bishnoi
No ratings yet
Self Made Project File by Rahul Bishnoi
35 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
Data Analyst Syllabus (For Aundh)
No ratings yet
Data Analyst Syllabus (For Aundh)
8 pages
Set B
No ratings yet
Set B
8 pages
Stats Unit1
No ratings yet
Stats Unit1
27 pages
Kushal Kadayat
No ratings yet
Kushal Kadayat
33 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Prac 1
No ratings yet
Prac 1
5 pages
BI Pracrical
No ratings yet
BI Pracrical
12 pages
Unemployment
No ratings yet
Unemployment
18 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
DAP Mini Report1
No ratings yet
DAP Mini Report1
32 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
2A - Python+Data Analysis For Pyhton2 v2
No ratings yet
2A - Python+Data Analysis For Pyhton2 v2
38 pages
DS - Program Curriculum
No ratings yet
DS - Program Curriculum
11 pages
R Working Materials Prep
No ratings yet
R Working Materials Prep
43 pages
Wongjianwei (TP061912)
No ratings yet
Wongjianwei (TP061912)
13 pages
4BUIS014W Business Computing-Portfolio
No ratings yet
4BUIS014W Business Computing-Portfolio
7 pages
Pandas Fuction Notes
No ratings yet
Pandas Fuction Notes
3 pages
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
No ratings yet
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
11 pages
Prac 1
No ratings yet
Prac 1
5 pages
[email protected]
No ratings yet
[email protected]
13 pages
Learneverythingai
No ratings yet
Learneverythingai
9 pages
R-Lab p-4,2,1
No ratings yet
R-Lab p-4,2,1
12 pages
Business Analytics Project For 20 Marks - M
No ratings yet
Business Analytics Project For 20 Marks - M
4 pages
Data Representation
No ratings yet
Data Representation
13 pages
DAP Mini Report1
No ratings yet
DAP Mini Report1
28 pages
Jenisha INTERNSHIP REPORT-2
No ratings yet
Jenisha INTERNSHIP REPORT-2
19 pages
NumPy and Pandas Tutorial
No ratings yet
NumPy and Pandas Tutorial
8 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
Answer Key For SET-1 TO 3
No ratings yet
Answer Key For SET-1 TO 3
7 pages
Sample Question For Practice
No ratings yet
Sample Question For Practice
8 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
DPV - Session 1
No ratings yet
DPV - Session 1
51 pages
Excel Function Analysis Request - Monica AI Chat
No ratings yet
Excel Function Analysis Request - Monica AI Chat
11 pages
Explore and Transform Data Based On Rows - Transcript
No ratings yet
Explore and Transform Data Based On Rows - Transcript
3 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
Complete Case Analysis (CCA) : Advantages
No ratings yet
Complete Case Analysis (CCA) : Advantages
6 pages
Unemployment Ip (Vikas, Nikhil, Abhijith)
No ratings yet
Unemployment Ip (Vikas, Nikhil, Abhijith)
21 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
A Beginner's Guide To Grabbing and Analyzing Salary Data in Python - by Matt Grierson - Towards Data Science
No ratings yet
A Beginner's Guide To Grabbing and Analyzing Salary Data in Python - by Matt Grierson - Towards Data Science
20 pages
Report Shawari
No ratings yet
Report Shawari
10 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Data Analyst Roadmap - Sheet1
No ratings yet
Data Analyst Roadmap - Sheet1
1 page
Data Analytics Using Python
No ratings yet
Data Analytics Using Python
18 pages
Advanced Certification in Data Science (213 Hours) 75,999
No ratings yet
Advanced Certification in Data Science (213 Hours) 75,999
5 pages
Data Preprocessing & Visualization1
No ratings yet
Data Preprocessing & Visualization1
2 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Financial Analytics With Python
100% (1)
Financial Analytics With Python
40 pages
Engineering Service Revenues World Summary: Market Values & Financials by Country
From Everand
Engineering Service Revenues World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Drafting Service Revenues World Summary: Market Values & Financials by Country
From Everand
Drafting Service Revenues World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Document 140409074526 Phpapp01
No ratings yet
Document 140409074526 Phpapp01
66 pages
The Impact of Reward On Employee Performance (A Case Study of Malakand Private School)
No ratings yet
The Impact of Reward On Employee Performance (A Case Study of Malakand Private School)
10 pages
Lva1 App6892
No ratings yet
Lva1 App6892
62 pages
Dance Academy Management System: Presented by
No ratings yet
Dance Academy Management System: Presented by
21 pages
17020-Rohini Jamdade
No ratings yet
17020-Rohini Jamdade
24 pages
Dance and Health Research Project Report
No ratings yet
Dance and Health Research Project Report
47 pages
A Study On Service Quality Measurement and Its Impact in Opting Insurance Companies
No ratings yet
A Study On Service Quality Measurement and Its Impact in Opting Insurance Companies
21 pages
Service Quality and Customer Satisfaction: Variation in Customer Perception Across Demographic Profiles in Life Insurance Industry
No ratings yet
Service Quality and Customer Satisfaction: Variation in Customer Perception Across Demographic Profiles in Life Insurance Industry
9 pages
Kiran Shukla 2017
No ratings yet
Kiran Shukla 2017
148 pages
In Patient Satisfaction Survey-How Does It Help Our Health Care Delivery System (The Patient, The Health Care Giver and The Organization) ?
No ratings yet
In Patient Satisfaction Survey-How Does It Help Our Health Care Delivery System (The Patient, The Health Care Giver and The Organization) ?
10 pages
Index: 1.1 Key Features
No ratings yet
Index: 1.1 Key Features
53 pages
2-34-1378890420-3. Recruitment Screening - Full
No ratings yet
2-34-1378890420-3. Recruitment Screening - Full
8 pages
Industrial Accidents and Their Prevention: A Case of Satluj Jal Viduat Nigam Limited, Shimla, Himachal Pradesh
No ratings yet
Industrial Accidents and Their Prevention: A Case of Satluj Jal Viduat Nigam Limited, Shimla, Himachal Pradesh
7 pages
A Literature Review On Global Occupational Safety and Health Practice & Accidents Severity
No ratings yet
A Literature Review On Global Occupational Safety and Health Practice & Accidents Severity
32 pages
Summary of Findings, Suggestions and Conclusion
50% (2)
Summary of Findings, Suggestions and Conclusion
16 pages
Prevention of Industrial Accidents: Measures and Challenges
No ratings yet
Prevention of Industrial Accidents: Measures and Challenges
8 pages
Name - Shashi Kiran N K College - Govt Engineering College K.R.Pet Email: - CONTACT - +91 9741529057
No ratings yet
Name - Shashi Kiran N K College - Govt Engineering College K.R.Pet Email: - CONTACT - +91 9741529057
6 pages
VEHICLE INSURANCE - Report
No ratings yet
VEHICLE INSURANCE - Report
9 pages
THE EFFECTIVENESS OF TIME MANAGEMENT FOR EMPLOYEES - Report
0% (1)
THE EFFECTIVENESS OF TIME MANAGEMENT FOR EMPLOYEES - Report
85 pages
CN7021 Project Template
No ratings yet
CN7021 Project Template
8 pages
Hitbsecconf2014: Malaysia Hotel Reservation Form
No ratings yet
Hitbsecconf2014: Malaysia Hotel Reservation Form
1 page
Travel Management
No ratings yet
Travel Management
101 pages
Design of Motor Vehicle Insurance Policy Management Application
No ratings yet
Design of Motor Vehicle Insurance Policy Management Application
11 pages
May 22nd To May 29th, 2010: Reservations Department
No ratings yet
May 22nd To May 29th, 2010: Reservations Department
1 page
Hotel Reservation: Personal Information
No ratings yet
Hotel Reservation: Personal Information
2 pages
Ijert Ijert: Online Insurance Management System
No ratings yet
Ijert Ijert: Online Insurance Management System
4 pages
Burner Logic System PDF
No ratings yet
Burner Logic System PDF
5 pages
Alarm System - DSC Pc1555 - Faq
No ratings yet
Alarm System - DSC Pc1555 - Faq
3 pages
Categorical Propositions Syllolism
No ratings yet
Categorical Propositions Syllolism
6 pages
Unit Ii
No ratings yet
Unit Ii
45 pages
Summary Performance Rating
No ratings yet
Summary Performance Rating
51 pages
Nano Fluids PDF
No ratings yet
Nano Fluids PDF
22 pages
NUMERALS (Bilangan) : Cardinal Ordinal Fraction
No ratings yet
NUMERALS (Bilangan) : Cardinal Ordinal Fraction
3 pages
MECAPRO Ang - A
100% (1)
MECAPRO Ang - A
99 pages
Stretch Wrap
No ratings yet
Stretch Wrap
18 pages
Thesis Vamk
100% (3)
Thesis Vamk
7 pages
Knowing Atoms Better: Htt0.p://phet - Colorado.edu/en/simulation/build-An-Atom
No ratings yet
Knowing Atoms Better: Htt0.p://phet - Colorado.edu/en/simulation/build-An-Atom
5 pages
Spring Boot
No ratings yet
Spring Boot
19 pages
Srm-3006: Selective Radiation Meter For Electromagnetic Fields Up To 6 GHZ
No ratings yet
Srm-3006: Selective Radiation Meter For Electromagnetic Fields Up To 6 GHZ
24 pages
Nexus Charge Amplifier
No ratings yet
Nexus Charge Amplifier
16 pages
TIME
No ratings yet
TIME
10 pages
Omega Low Voltage Recessed Downlight Catalog 9-84
No ratings yet
Omega Low Voltage Recessed Downlight Catalog 9-84
16 pages
JT Spiral-Tube Brochure
No ratings yet
JT Spiral-Tube Brochure
2 pages
Technical Cataloguel
No ratings yet
Technical Cataloguel
20 pages
Atmega32A DataSheet Complete DS40002072A 4
No ratings yet
Atmega32A DataSheet Complete DS40002072A 4
15 pages
Bitumen Emulsion
No ratings yet
Bitumen Emulsion
8 pages
Radar Link Budget Analysis Using MATLAB Radar Designer
No ratings yet
Radar Link Budget Analysis Using MATLAB Radar Designer
14 pages
Final Test Series (Online) JEE (Advanced) - 2021: Phase-I
No ratings yet
Final Test Series (Online) JEE (Advanced) - 2021: Phase-I
16 pages
An Empirical Evaluation of Generic Convolutional and Recurrent Networks For Sequence Modeling
No ratings yet
An Empirical Evaluation of Generic Convolutional and Recurrent Networks For Sequence Modeling
14 pages
Leverages
No ratings yet
Leverages
13 pages
Customizing AutoCAD Isometrics in AutoCAD Plant 3D
No ratings yet
Customizing AutoCAD Isometrics in AutoCAD Plant 3D
11 pages
Hydraulic Engineering Lab Manual
No ratings yet
Hydraulic Engineering Lab Manual
27 pages
Developing Machine Learning Based Passenger Behaviour Assesment Model For Davangere Bus Transport System
No ratings yet
Developing Machine Learning Based Passenger Behaviour Assesment Model For Davangere Bus Transport System
15 pages