0% found this document useful (0 votes)

5 views

Lecture 12 - Art and Science of Data Visualization

The document discusses the art and science of data visualization, focusing on techniques for visualizing categorical data and 1D relations using Python libraries like Pandas and Seaborn. It includes examples of visualizing school data across different localities and crime data, showcasing frequency tables, percentages, and cumulative percentages. Additionally, it demonstrates various plotting techniques such as bar plots, box plots, and violin plots to effectively represent the data.

Uploaded by

physizzmva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Lecture 12 - Art and Science of Data Visualization

Uploaded by

physizzmva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Art and Science of Data Visualization

Visualizing 1D relations

Visualizing Categorical Data

In [1]: 1 import pandas as pd
2 #link to data
3 linkRepo='https://github.com/resourcesbookvisual/data/'
4 linkFile='raw/master/eduwa.csv'
5 fullLink=linkRepo+linkFile
6 eduwa=pd.read_csv(fullLink)
7 import seaborn.objects as so

In [2]: 1 # List of public schools in US

2 display(eduwa.head())
3 eduwa.columns
4

NCES.School.ID State.School.ID NCES.District.ID State.District.ID Low.Grade High.Grade School.Name District County St

Marysville
10th Street
0 530486002475 WA-31025-1656 5304860 WA-31025 6 8 School Snohomish
School
District

Evergreen
49th Street School
1 530270001270 WA-06114-1646 5302700 WA-06114 KG 12 Clark
Academy District
(Clark)

A G West Tumwater
7
2 530910002602 WA-34033-4500 5309100 WA-34033 9 12 Black Hills School Thurston
High School District

Aberdeen
A J West Grays
3 530003000001 WA-14005-2834 5300030 WA-14005 PK 6 School
Elementary Harbor
District

A-3
Spokane
Multiagency
4 530825002361 WA-32081-1533 5308250 WA-32081 9 12 School Spokane
Adolescent
District
Prog

5 rows × 24 columns
 

Out[2]: Index(['NCES.School.ID', 'State.School.ID', 'NCES.District.ID',

'State.District.ID', 'Low.Grade', 'High.Grade', 'School.Name',
'District', 'County', 'Street.Address', 'City', 'State', 'ZIP',
'ZIP.4-digit', 'Phone', 'Locale.Code', 'LocaleType', 'LocaleSub',
'Charter', 'Title.I.School', 'Title.1.School.Wide',
'Student.Teacher.Ratio', 'Free.Lunch', 'Reduced.Lunch'],
dtype='object')

In [3]: 1 # Let us looks at locality Type

2 FTloc = pd.value_counts(eduwa.LocaleType,dropna=False).reset_index()
3 FTloc.columns = ['Location','Count']
4 # Fill NA as Uncategorized
5 FTloc.fillna('Uncategorized', inplace=True)
6 FTloc
C:\Users\ANUP\AppData\Local\Temp\ipykernel_17588\2928297991.py:2: FutureWarning: pandas.value_counts is d
eprecated and will be removed in a future version. Use pd.Series(obj).value_counts() instead.
FTloc = pd.value_counts(eduwa.LocaleType,dropna=False).reset_index()

Out[3]: Location Count

0 Suburb 798

1 City 714

2 Rural 505

3 Town 338

4 Uncategorized 72
In [7]: 1 # Is the schools located equally across various localities?
2 fig = so.Plot(FTloc, x='Location', y='Count' ).add(so.Bar())
3 so.Plot.show(fig)

In [24]: 1 # Visualizing Gaps

2 FTloc['Percentage']= 100*(FTloc.Count/FTloc.Count.sum()).round(4)
3 FTloc['Gap'] = FTloc['Percentage']-25
4 fig = so.Plot(FTloc, x='Location', y='Gap').add(so.Bar())
5 so.Plot.show(fig)

In [5]: 1 linkRepo='https://github.com/resourcesbookvisual/data/'
2 linkFile='raw/master/crime.csv'
3 fullLink=linkRepo+linkFile
4 crime=pd.read_csv(fullLink)
In [12]: 1 display(crime.head())
2 crime.columns

ReportNumber OccurredDate year month weekday OccurredTime OccurredDayTime ReportedDate ReportedTime DaysToRe

0 2.013000e+13 2013-07-09 2013.0 7.0 Tuesday 1930.0 evening 2013-07-10 1722.0

1 2.013000e+13 2013-07-09 2013.0 7.0 Tuesday 1917.0 evening 2013-07-09 2052.0

2 2.013000e+13 2013-07-09 2013.0 7.0 Tuesday 1900.0 evening 2013-07-10 35.0

3 2.013000e+13 2013-07-09 2013.0 7.0 Tuesday 1900.0 evening 2013-07-10 1258.0

4 2.013000e+13 2013-07-09 2013.0 7.0 Tuesday 1846.0 evening 2013-07-09 1846.0

 

Out[12]: Index(['ReportNumber', 'OccurredDate', 'year', 'month', 'weekday',

'OccurredTime', 'OccurredDayTime', 'ReportedDate', 'ReportedTime',
'DaysToReport', 'crimecat', 'CrimeSubcategory',
'PrimaryOffense.Description', 'Precinct', 'Sector', 'Beat',
'Neighborhood'],
dtype='object')

In [13]: 1 # Frequency table

2 FTcri = pd.value_counts(crime.crimecat,dropna=False).reset_index()
3 FTcri.columns = ['Crimes','Counts']
4 FTcri.head()
C:\Users\ANUP\AppData\Local\Temp\ipykernel_17588\4233022211.py:2: FutureWarning: pandas.value_counts is d
eprecated and will be removed in a future version. Use pd.Series(obj).value_counts() instead.
FTcri = pd.value_counts(crime.crimecat,dropna=False).reset_index()

Out[13]: Crimes Counts

0 THEFT 170946

1 CAR PROWL 142447

2 BURGLARY 76630

3 AGGRAVATED ASSAULT 21315

4 NARCOTIC 16864

In [15]: 1 # adding Percentage

2 FTcri['Percent']=100*FTcri.Counts/FTcri.Counts.sum()
3 # adding Cumulative Percentage
4 FTcri['CumPercent']=100*FTcri.Counts.cumsum()/FTcri.Counts.sum()
5 # renaming missing values
6 FTcri['Crimes'].fillna('UNCATEGORIZED', inplace=True)
7 FTcri.head()
C:\Users\ANUP\AppData\Local\Temp\ipykernel_17588\1906479953.py:6: FutureWarning: A value is trying to be
set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate obje
ct on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace
=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original obje
ct.

FTcri['Crimes'].fillna('UNCATEGORIZED', inplace=True)

Out[15]: Crimes Counts Percent CumPercent

0 THEFT 170946 34.209863 34.209863

1 CAR PROWL 142447 28.506618 62.716481

2 BURGLARY 76630 15.335262 78.051743

3 AGGRAVATED ASSAULT 21315 4.265576 82.317320

4 NARCOTIC 16864 3.374838 85.692158

In [16]: 1 p = so.Plot(FTcri, y = 'CumPercent', x = 'Crimes').add(so.Bar())
2 p# X ticks are not visible. How to rotate x ticks???
Out[16]:

In [23]: 1 # Use matplotlib- to rotate tick labels

2 import matplotlib.pyplot as plt
3 fig, ax = plt.subplots()
4 p = so.Plot(FTcri, y = 'CumPercent', x = 'Crimes').add(so.Bar()).on(ax)
5 ax.xaxis.set_tick_params(rotation=90)
6 p.show()
7 # How to show major crimes? Crimes contributing to 80% of total crimes
In [41]: 1 fig, ax = plt.subplots()
2 FTcri['threshold'] = 80
3 p = so.Plot(FTcri).add(so.Bar(), y = 'CumPercent', x = 'Crimes').add(so.Line(linestyle='dashed'), y =
4 ax.xaxis.set_tick_params(rotation=90)
5 p.show()
 

In [38]: 1 # Horizontal bar plot

2 fig = so.Plot(FTcri, x = 'CumPercent', y = 'Crimes').add(so.Bar())
3 so.Plot.show(fig)
In [42]: 1 # Highest Grade offered by the schools
2 eduwa['High.Grade']
3 # Can we visualize the number of schools based on high grade
Out[42]: 0 8
1 12
2 12
3 6
4 12
..
2422 12
2423 6
2424 12
2425 6
2426 8
Name: High.Grade, Length: 2427, dtype: object

In [44]: 1 hg = pd.value_counts(eduwa['High.Grade']).reset_index()
2 so.Plot(hg, 'High.Grade','count').add(so.Bar())
C:\Users\ANUP\AppData\Local\Temp\ipykernel_17588\2752507459.py:1: FutureWarning: pandas.value_counts is d
eprecated and will be removed in a future version. Use pd.Series(obj).value_counts() instead.
hg = pd.value_counts(eduwa['High.Grade']).reset_index()

Out[44]:

In [48]: 1 ordLabels=["PK","KG","1","2","3","4","5","6","7","8","9","10","11","12","13"]
2 HGtype= pd.CategoricalDtype(categories=ordLabels,ordered=True)
3 display(HGtype)
4 # apply that format to the column
5 eduwa['High.Grade-O']= eduwa['High.Grade'].astype(HGtype)
6 display(eduwa['High.Grade-O'])
CategoricalDtype(categories=['PK', 'KG', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'10', '11', '12', '13'],
, ordered=True, categories_dtype=object)

0 8
1 12
2 12
3 6
4 12
..
2422 12
2423 6
2424 12
2425 6
2426 8
Name: High.Grade-O, Length: 2427, dtype: category
Categories (15, object): ['PK' < 'KG' < '1' < '2' ... '10' < '11' < '12' < '13']
In [49]: 1 # Frequency table
2 FThg = pd.value_counts(eduwa['High.Grade-O'],ascending=False,sort=False,dropna=False).reset_index()
3 FThg.columns = ['MaxOffer','Counts']
4 # adding column
5 FThg['CumPercent']=100*FThg.Counts.cumsum()/FThg.Counts.sum()
6 display(FThg)
C:\Users\ANUP\AppData\Local\Temp\ipykernel_17588\1997154553.py:2: FutureWarning: pandas.value_counts is d
eprecated and will be removed in a future version. Use pd.Series(obj).value_counts() instead.
FThg = pd.value_counts(eduwa['High.Grade-O'],ascending=False,sort=False,dropna=False).reset_index()

MaxOffer Counts CumPercent

0 PK 82 3.378657

1 KG 7 3.667079

2 1 6 3.914297

3 2 16 4.573548

4 3 19 5.356407

5 4 45 7.210548

6 5 755 38.318912

7 6 266 49.278945

8 7 11 49.732180

9 8 427 67.325917

10 9 15 67.943964

11 10 7 68.232386

12 11 5 68.438401

13 12 757 99.629172

14 13 9 100.000000

In [50]: 1 #Visualize using bar

2 fig = so.Plot(FThg, x='MaxOffer',y='Counts').add(so.Bar())
3 so.Plot.show(fig)
In [55]: 1 # Visualize using boxplot
2 import seaborn as sns
3 import numpy as np
4 eduwa['High.Grade-N'] = eduwa['High.Grade-O'].cat.codes
5 sns.boxplot(eduwa, x='High.Grade-N')#Modify x = 0, set xticks right
6 plt.xticks(np.arange(0,14),['PK','KG',1,2,3,4,5,6,7,8,9,10,11,12]);

In [58]: 1 # Combining box plot with bar plot - Violin plot

2 sns.boxplot(eduwa,x='High.Grade-N', fill = False)
3 sns.violinplot(eduwa,x='High.Grade-N',width = 1.2, fill = True)
4 plt.ylim([-0.6,0.6])
5 plt.xticks(np.arange(0,14),['PK','KG',1,2,3,4,5,6,7,8,9,10,11,12]);
Visualization of numerical data

In [49]: 1 eduwa['Reduced.Lunch']
Out[49]: 0 3.0
1 9.0
2 40.0
3 10.0
4 4.0
...
2422 0.0
2423 57.0
2424 51.0
2425 35.0
2426 38.0
Name: Reduced.Lunch, Length: 2427, dtype: float64

In [59]: 1 # Visualize using bar plot

2 so.Plot(eduwa, x = 'Reduced.Lunch').add(so.Bar(),so.Count()).label(y = "$count (\mathbf{N})$").show()
In [52]: 1 # Visualize using boxplot
2 sns.boxplot(eduwa, x = 'Reduced.Lunch')
3 import matplotlib.pyplot as plt
4 # plt.ylabel(0)
5 plt.yticks([-0.4,-0.2,0,0.2,0.4],[-0.4,-0.2,0,0.2,0.4])
6 plt.grid()
In [64]: 1 # Visualize using histogram and map into a normal density
2 import numpy as np
3 import scipy.stats as stats
4 statVals=eduwa['Reduced.Lunch'].describe().to_dict()
5 display(statVals)
6 Start= 0
7 width=10
8 newMax= 310
9 TheBreaks=np.arange(Start,newMax+width,width)
10 display(TheBreaks)
11 intervals=pd.cut(eduwa['Reduced.Lunch'],bins=TheBreaks,include_lowest=True)
12 display(intervals)
13 topCount=intervals.value_counts().max()
14 print(topCount)
15 widthY=50
16 reminderY=topCount%widthY
17 top_Y=topCount+widthY-reminderY if reminderY<widthY else topCount
18 vertiVals=list(range(0,top_Y+widthY,widthY))
19 N = statVals['count']
20 MEAN = statVals['mean']
21 STD = statVals['std']
22 def NormalHist(x,m=MEAN,s=STD,n=N,w=width):
23 return stats.norm.pdf(x, m, s)*n*w
{'count': 2296.0,
'mean': 33.53440766550523,
'std': 36.556836589555466,
'min': 0.0,
'25%': 5.0,
'50%': 25.5,
'75%': 47.0,
'max': 301.0}

array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120,
130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250,
260, 270, 280, 290, 300, 310])

0 (-0.001, 10.0]
1 (-0.001, 10.0]
2 (30.0, 40.0]
3 (-0.001, 10.0]
4 (-0.001, 10.0]
...
2422 (-0.001, 10.0]
2423 (50.0, 60.0]
2424 (50.0, 60.0]
2425 (30.0, 40.0]
2426 (30.0, 40.0]
Name: Reduced.Lunch, Length: 2427, dtype: category
Categories (31, interval[float64, right]): [(-0.001, 10.0] < (10.0, 20.0] < (20.0, 30.0] < (30.0, 40.0]
... (270.0, 280.0] < (280.0, 290.0] < (290.0, 300.0] < (300.0, 310.0]]

731
In [66]: 1 fig, ax = plt.subplots()
2 p = so.Plot(eduwa, x = 'Reduced.Lunch').add(so.Bar(),so.Hist("density")).on(ax)#.add(so.Line(),so.KDE(c
3 ax.plot(np.arange(0,300),NormalHist(np.arange(0,300))/sum(NormalHist(np.arange(0,300))), label = "Ideal
4 p.show()

Visualizing 2D relations

Categorical-Categorical relations

In [81]: 1 # Looking at the relation between Precint and time of occurence of crime
2 crime[['Precinct', 'OccurredDayTime']]
3 PrecintDaytime = pd.crosstab(crime.Precinct, crime.OccurredDayTime, margins=True)
4 P1 = PrecintDaytime.sort_values('All',ascending=False).drop("All").drop("All", axis=1)
5 P1
Out[81]: OccurredDayTime afternoon day evening night

Precinct

NORTH 48754 33744 39867 37942

WEST 48931 30366 33766 30925

EAST 20774 15976 17380 19880

SOUTH 22147 17322 16240 15497

SOUTHWEST 14221 10595 11169 11034

In [105]: 1 P = P1.stack().reset_index()
2 P.columns=['Precinct', 'OccurredDayTime', 'Count']
3 so.Plot(P,x='Precinct', y='Count', color = 'OccurredDayTime').add(so.Bars(), so.Stack())
Out[105]:

In [94]: 1 # Better option is to use side-by-side (or dodge) - SO (Spacing - width = 0.9)
2 so.Plot(P,x='Precinct', y='Count', color = 'OccurredDayTime-O').add(so.Bars(width = 0.9), so.Dodge())
Out[94]:

In [119]: 1 # Relative contribution of various crines across precincts

2 PrecintDaytime = pd.crosstab(crime.Precinct, crime.OccurredDayTime, normalize='columns')
3 # display(PrecintDaytime)
4 P2 = PrecintDaytime.stack().reset_index()
5 P2.columns=['Precinct', 'OccurredDayTime', 'Count']
6 display(P2.head())
Precinct OccurredDayTime Count

0 EAST afternoon 0.134176

1 EAST day 0.147922

2 EAST evening 0.146763

3 EAST night 0.172453

4 NORTH afternoon 0.314893

In [121]: 1 # Relative contribution of various crines across precincts - SO
2 so.Plot(P2,x='OccurredDayTime', y='Count', color='Precinct').add(so.Bar(width = 0.9), so.Stack())
Out[121]:

In [68]: 1 # What about a big crosstab?

2 CrimeDay = pd.crosstab(crime.crimecat, crime.OccurredDayTime)
3 CrimeDay
Out[68]: OccurredDayTime afternoon day evening night

crimecat

AGGRAVATED ASSAULT 5366 3564 4884 7501

ARSON 167 196 191 486

BURGLARY 22288 24139 14121 16082

CAR PROWL 38273 26740 42595 34839

DISORDERLY CONDUCT 81 41 67 79

DUI 939 706 2038 8522

FAMILY OFFENSE-NONVIOLENT 2516 1748 1217 1120

GAMBLE 4 4 7 2

HOMICIDE 46 41 49 131

LIQUOR LAW VIOLATION 491 112 410 606

LOITERING 31 20 25 9

NARCOTIC 6416 2415 3924 4109

PORNOGRAPHY 53 65 17 31

PROSTITUTION 675 115 1425 1340

RAPE 318 332 354 854

ROBBERY 4737 2584 4139 5372

SEX OFFENSE-OTHER 1759 1501 1014 1776

THEFT 64868 38687 38980 28410

TRESPASS 5184 4848 2598 3289

WEAPON 1445 735 947 1624

In [123]: 1 CrimeDay_n = pd.crosstab(crime.crimecat, crime.OccurredDayTime, normalize='columns')

2 CrimeDay_df = CrimeDay_n.stack().reset_index()
3 CrimeDay_df.columns=["Crime", "DayTime", "NormalizedCounts"]
4 dayorder = pd.CategoricalDtype(["day", "afternoon", "evening", "night"], ordered=True)
5 CrimeDay_df['DayTime-O'] = CrimeDay_df.DayTime.astype(dayorder)
6 display(CrimeDay_df.head(5))
Crime DayTime NormalizedCounts DayTime-O

0 AGGRAVATED ASSAULT afternoon 0.034473 afternoon

1 AGGRAVATED ASSAULT day 0.032820 day

2 AGGRAVATED ASSAULT evening 0.041041 evening

3 AGGRAVATED ASSAULT night 0.064562 night

4 ARSON afternoon 0.001073 afternoon

In [156]: 1 so.Plot(CrimeDay_df,x='DayTime-O', y='Crime', pointsize = 'NormalizedCounts').add(so.Dot(),legend=False
Out[156]:

In [198]: 1 CrimeDay1 = pd.crosstab(crime.crimecat, crime.OccurredDayTime, normalize='columns', margins=True)

2 CrimeDay1.sort_values("All", ascending=False, inplace=True)
3 CrimeDay1.drop("All", axis=1, inplace=False)
4 CrimeDay_df1 = CrimeDay1.stack().reset_index()
5 CrimeDay_df1.columns=["Crime", "DayTime", "NormalizedCounts"]
6 CrimeDay_df1['DayTime-O'] = CrimeDay_df1.DayTime.astype(dayorder)
7 CrimeDay_df1['Percent'] = 100*CrimeDay_df1["NormalizedCounts"].round(1)
8 #display(CrimeDay_df1.head(5))
9 so.Plot(CrimeDay_df1,x='DayTime-O', y='Crime', pointsize = 'NormalizedCounts', text='Percent').add(so.D
 

Out[198]:
In [72]: 1 # Flipped Bar Chart
2 CrimeDay_df = CrimeDay_df.sort_values(by = 'NormalizedCounts', ascending = False)#Sorted the df in the
3 so.Plot(CrimeDay_df,y='Crime',x='NormalizedCounts').facet('DayTime-O').add(so.Bars(width = 1)).label(x
 

Out[72]:

In [73]: 1 # Heatmap
2 # Graphical representation of data where values are depicted by color.
3 sns.heatmap(CrimeDay_df.pivot(index = 'Crime', columns = 'DayTime-O', values = 'NormalizedCounts'))
Out[73]: <Axes: xlabel='DayTime-O', ylabel='Crime'>
Categorical-Numerical relation (2 variables)

In [6]: 1 crime.year.value_counts()
2 crime2 = crime[crime.year>2007].copy()
3 crime2.dropna(subset=['DaysToReport'], inplace=True)
4 crime2.fillna(value={'crimecat': 'Uncategorized'}, inplace=True)
5 display(crime2[['crimecat','DaysToReport']].head(10))
6 maxD = crime2.groupby('crimecat').describe()['DaysToReport'][['max', 'mean', 'std']]
7 maxD.head(5)

crimecat DaysToReport

0 NARCOTIC 1.0

1 BURGLARY 0.0

2 CAR PROWL 1.0

3 THEFT 1.0

4 FAMILY OFFENSE-NONVIOLENT 0.0

5 BURGLARY 0.0

6 THEFT 0.0

7 THEFT 0.0

8 CAR PROWL 1.0

9 THEFT 0.0

Out[6]: max mean std

crimecat

AGGRAVATED ASSAULT 2136.0 2.457019 34.881287

ARSON 151.0 0.901734 7.616278

BURGLARY 3653.0 4.225830 34.548476

CAR PROWL 2923.0 3.282945 31.155756

DISORDERLY CONDUCT 95.0 0.417910 5.810746

In [8]: 1 import seaborn as sns

2 sns.boxplot(crime2, y='crimecat', x=crime2['DaysToReport']/365, order = maxD.sort_values(by = 'max',asc
 

Out[8]: <Axes: xlabel='DaysToReport', ylabel='crimecat'>

Numerical-Numerical relationship

In [79]: 1 Crimebyday = pd.value_counts(crime2.OccurredDate).reset_index()

2 Crimebyday.columns=['Dates', 'Count']
3 # display(Crimebyday)
4 Crimebyday['Dates-F'] = pd.to_datetime(Crimebyday.Dates, format='%Y-%m-%d')
5 display(Crimebyday)
/tmp/ipykernel_19458/3571748079.py:1: FutureWarning: pandas.value_counts is deprecated and will be remove
d in a future version. Use pd.Series(obj).value_counts() instead.

Dates Count Dates-F

0 2017-07-01 199 2017-07-01

1 2017-05-26 192 2017-05-26

2 2016-01-20 186 2016-01-20

3 2015-12-01 184 2015-12-01

4 2018-07-19 183 2018-07-19

... ... ... ...

3958 2011-12-25 58 2011-12-25

3959 2008-12-25 52 2008-12-25

3960 2018-11-06 48 2018-11-06

3961 2012-01-19 47 2012-01-19

3962 2012-01-18 43 2012-01-18

3963 rows × 3 columns

In [78]: 1 so.Plot(Crimebyday, x='Dates-F', y='Count').add(so.Line())#Adjust xticks

Out[78]:
In [80]: 1 # Adding a local regression line
2 so.Plot(Crimebyday,x='Dates-F', y='Count').add(so.Dots(marker = '+')).add(so.Line(color = "0.2"),so.Pol
 

Out[80]:

In [81]: 1 #Ridge plots - Visualize density as a function of year

2 so.Plot(Crimebyday, x='Count').add(so.Area(),so.KDE()).facet(Crimebyday['Dates-F'].dt.year, wrap = 3)#A
 

Out[81]:

Correlation (Bivariate analysis)

𝜎𝑥,𝑦
𝜌𝑥,𝑦 =
𝜎𝑥 𝜎𝑦
∑
𝜎𝑥,𝑦 = (𝑥𝑖 − 𝑥¯)(𝑦𝑦 − 𝑦¯)
𝑖

√∑
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯2⎯
𝜎𝑥 = (𝑥𝑖 − 𝑥¯)
𝑖

√∑
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯2⎯
𝜎𝑦 = (𝑦𝑖 − 𝑦¯)
𝑖

image.png
In [82]: 1 crime_d = crime[pd.to_datetime(crime.year, format="%Y") > pd.to_datetime('2015', format="%Y")]
2 operations = {'DaysToReport': 'mean', 'Neighborhood': 'count'}
3 crime_neigh = crime_d.groupby('Neighborhood').agg(operations)
4 display(crime_neigh.head(5))
5 crime_sum = crime_neigh.Neighborhood.sum()
6 crime_neigh['Neighborhood']= crime_neigh.Neighborhood/crime_sum * 100
7 display(crime_neigh.head(5))

DaysToReport Neighborhood

Neighborhood

ALASKA JUNCTION 3.265236 2330

ALKI 3.798742 636

BALLARD NORTH 3.876259 3079

BALLARD SOUTH 3.725234 4815

BELLTOWN 2.343525 4023

DaysToReport Neighborhood

Neighborhood

ALASKA JUNCTION 3.265236 1.644563

ALKI 3.798742 0.448902

BALLARD NORTH 3.876259 2.173223

BALLARD SOUTH 3.725234 3.398528

BELLTOWN 2.343525 2.839518

In [83]: 1 so.Plot(crime_neigh, x='DaysToReport', y='Neighborhood').add(so.Dots())

Out[83]:

In [84]: 1 import scipy.stats as stats

2 cor, pval = stats.spearmanr(crime_neigh.Neighborhood, crime_neigh.DaysToReport)
3 display(cor)
-0.20139657939884478
In [85]: 1 so.Plot(crime_neigh, x='DaysToReport', y='Neighborhood').add(so.Dots()).add(so.Line(color = "0.2"),so.P
 

Out[85]:

What about tertiary (or more) relationships?

Use facet grid on single variable
Heatmaps

Load Chart of TMS 475 PDF
No ratings yet
Load Chart of TMS 475 PDF
10 pages
Data Preprocessing Python Tome III
No ratings yet
Data Preprocessing Python Tome III
12 pages
AD3301 - Data - Transformation - Ipynb - Colaboratory
No ratings yet
AD3301 - Data - Transformation - Ipynb - Colaboratory
27 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
XII IP PRACTICAL LIST 2022-23-1
No ratings yet
XII IP PRACTICAL LIST 2022-23-1
23 pages
List of Practical Ip065 Xii Session 2025 Ckc Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 Ckc Academy
19 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
Creation of Series Using List, Dictionary & Ndarray
No ratings yet
Creation of Series Using List, Dictionary & Ndarray
65 pages
IP practical
No ratings yet
IP practical
24 pages
Codes
No ratings yet
Codes
44 pages
Kis W Class 12 Practical File
No ratings yet
Kis W Class 12 Practical File
31 pages
EXP-3
No ratings yet
EXP-3
10 pages
Ip Practical 2024
No ratings yet
Ip Practical 2024
12 pages
Machine Learning
No ratings yet
Machine Learning
67 pages
Pandas 2 Complete Notes Class XII
No ratings yet
Pandas 2 Complete Notes Class XII
18 pages
IP_Practical
No ratings yet
IP_Practical
15 pages
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
No ratings yet
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
65 pages
Ip Practical File
No ratings yet
Ip Practical File
20 pages
CheatSheet
No ratings yet
CheatSheet
15 pages
LIST OF PRACTICAL IP065 XII SESSION 2025 CKC ACADEMY
No ratings yet
LIST OF PRACTICAL IP065 XII SESSION 2025 CKC ACADEMY
19 pages
CMSC320 Final Project
No ratings yet
CMSC320 Final Project
20 pages
Classwork For GGIS XII 2024-25
No ratings yet
Classwork For GGIS XII 2024-25
1 page
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Practical File Programs
No ratings yet
Practical File Programs
8 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
Innovative Assignment PDF
No ratings yet
Innovative Assignment PDF
11 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
Matplotlib linechatsy
No ratings yet
Matplotlib linechatsy
38 pages
Ip Practical File
No ratings yet
Ip Practical File
20 pages
Journal 12
No ratings yet
Journal 12
54 pages
XII CBSE IP Lab Solutions(2024-25)
No ratings yet
XII CBSE IP Lab Solutions(2024-25)
15 pages
Python Practical File 12
No ratings yet
Python Practical File 12
22 pages
ML lab manual 1-10
No ratings yet
ML lab manual 1-10
58 pages
copyy
No ratings yet
copyy
4 pages
CLASS XII - IP List of Practicals with Coding 2020
No ratings yet
CLASS XII - IP List of Practicals with Coding 2020
15 pages
student analysis
No ratings yet
student analysis
16 pages
Practical Record Programs - Solutions
No ratings yet
Practical Record Programs - Solutions
23 pages
Pandas Notes
No ratings yet
Pandas Notes
27 pages
Solution
No ratings yet
Solution
8 pages
Data Wrangling
No ratings yet
Data Wrangling
5 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Data Frame Demo
No ratings yet
Data Frame Demo
73 pages
ANS KEY SET A
No ratings yet
ANS KEY SET A
6 pages
Unit3_3) Pandas.ipynb - Colab
No ratings yet
Unit3_3) Pandas.ipynb - Colab
11 pages
Ip Project
No ratings yet
Ip Project
27 pages
practical file class xii
No ratings yet
practical file class xii
25 pages
DMT Function
No ratings yet
DMT Function
10 pages
Answers Practical File
No ratings yet
Answers Practical File
19 pages
IP12 gargi
No ratings yet
IP12 gargi
32 pages
Pandas & Mysql
No ratings yet
Pandas & Mysql
20 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
PDF&Rendition=1
No ratings yet
PDF&Rendition=1
47 pages
IP PRACTICAL REVISION QUESTION
No ratings yet
IP PRACTICAL REVISION QUESTION
8 pages
Python Pandas-DataFrames Complete - Jupyter Notebook
No ratings yet
Python Pandas-DataFrames Complete - Jupyter Notebook
34 pages
Grade 12 - IP Practicals (1 To 9)
No ratings yet
Grade 12 - IP Practicals (1 To 9)
12 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
python interviews
No ratings yet
python interviews
154 pages
Dsa Final Project Report
No ratings yet
Dsa Final Project Report
30 pages
MaxFreq Qs
No ratings yet
MaxFreq Qs
2 pages
Lecture 1 - Mosfet - m2024-25
No ratings yet
Lecture 1 - Mosfet - m2024-25
114 pages
Matplotlib Library
No ratings yet
Matplotlib Library
17 pages
Getting Started with Pandas._new
No ratings yet
Getting Started with Pandas._new
13 pages
Seaborn_Visualization
No ratings yet
Seaborn_Visualization
18 pages
Numpy_new
No ratings yet
Numpy_new
16 pages
DSA Assignmet 1
No ratings yet
DSA Assignmet 1
14 pages
Esra Saleh Al Bakhit Aidha Al Ameri: Personal Information
No ratings yet
Esra Saleh Al Bakhit Aidha Al Ameri: Personal Information
2 pages
International Research Journal of Business and Management - IRJBM
No ratings yet
International Research Journal of Business and Management - IRJBM
10 pages
FDE 201 - Lecture Notes
0% (1)
FDE 201 - Lecture Notes
190 pages
Determinants of Costs and Cost Function
No ratings yet
Determinants of Costs and Cost Function
9 pages
Project Kick-Off Template
100% (1)
Project Kick-Off Template
31 pages
Report - CCTV Full Business Case
No ratings yet
Report - CCTV Full Business Case
30 pages
Pei 1 Scope Sequence
No ratings yet
Pei 1 Scope Sequence
3 pages
My Thoughts On The Limits of Privacy
No ratings yet
My Thoughts On The Limits of Privacy
12 pages
AIAcompact Product Information
No ratings yet
AIAcompact Product Information
2 pages
Siggelkow
No ratings yet
Siggelkow
54 pages
Treatment of Oil-Based Mud Cuttings Via Bio Remediation
No ratings yet
Treatment of Oil-Based Mud Cuttings Via Bio Remediation
1 page
NIV Appointment System
No ratings yet
NIV Appointment System
1 page
Electronic Components
No ratings yet
Electronic Components
30 pages
Specification: Constant Current Discharge Characteristics: A (25)
No ratings yet
Specification: Constant Current Discharge Characteristics: A (25)
2 pages
Running Head: To Fine or Not
No ratings yet
Running Head: To Fine or Not
6 pages
Detailing Rules & Special Dimensioning Rules in Eurocode 8
No ratings yet
Detailing Rules & Special Dimensioning Rules in Eurocode 8
5 pages
Sample Sec-Cert Litigation
No ratings yet
Sample Sec-Cert Litigation
3 pages
CVL 398 pd1
No ratings yet
CVL 398 pd1
41 pages
CSV Sera Bazar Eid
No ratings yet
CSV Sera Bazar Eid
18 pages
Data Science For Social Good
No ratings yet
Data Science For Social Good
7 pages
MSDS Capella HC Sae 20W
No ratings yet
MSDS Capella HC Sae 20W
7 pages
Sem - Output.cad 313
No ratings yet
Sem - Output.cad 313
7 pages
Toland Judgement
100% (1)
Toland Judgement
8 pages
Jules D Aguilar
No ratings yet
Jules D Aguilar
2 pages
Principles of Management
No ratings yet
Principles of Management
11 pages
Service&Maintainance of RT55
No ratings yet
Service&Maintainance of RT55
27 pages
All You Need To Know About The Long Term Repo Operations Introduced by RBI
No ratings yet
All You Need To Know About The Long Term Repo Operations Introduced by RBI
2 pages
PAPITa FParticipant List 20070418
No ratings yet
PAPITa FParticipant List 20070418
18 pages
Foreign Language Module 1
No ratings yet
Foreign Language Module 1
5 pages