Ass 1B Part1 Solution
Ass 1B Part1 Solution
[2]: Id EmployeeName \
0 1 NATHANIEL FORD
1 2 GARY JIMENEZ
2 3 ALBERT PARDINI
3 4 CHRISTOPHER CHONG
4 5 PATRICK GARDNER
… … …
148649 148650 Roy I Tillery
148650 148651 Not provided
148651 148652 Not provided
148652 148653 Not provided
148653 148654 Joe Lopez
JobTitle BasePay \
0 GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 167411.18
1 CAPTAIN III (POLICE DEPARTMENT) 155966.02
2 CAPTAIN III (POLICE DEPARTMENT) 212739.13
3 WIRE ROPE CABLE MAINTENANCE MECHANIC 77916.0
4 DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 134401.6
… … …
148649 Custodian 0.00
148650 Not provided Not Provided
148651 Not provided Not Provided
148652 Not provided Not Provided
148653 Counselor, Log Cabin Ranch 0.00
1
OvertimePay OtherPay Benefits TotalPay TotalPayBenefits \
0 0.0 400184.25 NaN 567595.43 567595.43
1 245131.88 137811.38 NaN 538909.28 538909.28
2 106088.18 16452.6 NaN 335279.91 335279.91
3 56120.71 198306.9 NaN 332343.61 332343.61
4 9737.0 182234.59 NaN 326373.19 326373.19
… … … … … …
148649 0.00 0.00 0.00 0.00 0.00
148650 Not Provided Not Provided Not Provided 0.00 0.00
148651 Not Provided Not Provided Not Provided 0.00 0.00
148652 Not Provided Not Provided Not Provided 0.00 0.00
148653 0.00 -618.13 0.00 -618.13 -618.13
2
Year Notes Agency Status
0 2011 NaN San Francisco NaN
1 2011 NaN San Francisco NaN
2 2011 NaN San Francisco NaN
3 2011 NaN San Francisco NaN
4 2011 NaN San Francisco NaN
0.4 4. Use the .info() method to find out how many entries there are.
[4]: sal.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 148654 entries, 0 to 148653
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 148654 non-null int64
1 EmployeeName 148654 non-null object
2 JobTitle 148654 non-null object
3 BasePay 148049 non-null object
4 OvertimePay 148654 non-null object
5 OtherPay 148654 non-null object
6 Benefits 112495 non-null object
7 TotalPay 148654 non-null float64
8 TotalPayBenefits 148654 non-null float64
9 Year 148654 non-null int64
10 Notes 0 non-null float64
11 Agency 148654 non-null object
12 Status 38119 non-null object
dtypes: float64(3), int64(2), object(8)
memory usage: 14.7+ MB
3
0.7 7. What is the job title of JOSEPH DRISCOLL?
[11]: print("The job title of Joseph Driscoll is: ",sal[sal['EmployeeName'] ==␣
↪'JOSEPH DRISCOLL']['JobTitle'].iloc[0])
0.10 10. What is the name of lowest paid person (including benefits)?
[16]: print("The name of lowest paid person is: ",sal[sal['TotalPayBenefits'] ==␣
↪sal['TotalPayBenefits'].min()]['EmployeeName'].iloc[0])
0.11 11. What was the average (mean) BasePay of all employees per year?
(2011-2014) ?
[24]: sal.groupby('Year')['BasePay'].mean()
[24]: Year
2011 63595.956517
2012 65436.406857
2013 69630.030216
2014 66564.421924
Name: BasePay, dtype: float64
4
0.13 13. What are the top 5 most common jobs?
[29]: print("Top 5 most common jobs:\n",sal['JobTitle'].value_counts().head())
0.14 14. How many Job Titles were represented by only one person in 2013?
(e.g. Job Titles with only one occurrence in 2013?)
[30]: sal[sal['Year'] == 2013]['JobTitle'].value_counts().eq(1).sum()
[30]: 202
0.15 15. How many people have the word Chief in their job title?
[31]: sal[sal['JobTitle'].str.contains('Chief', case=False)]['EmployeeName'].count()
[31]: 627
0.16 16. Is there a correlation between length of the Job Title string and
Salary?
[32]: sal['TitleLength'] = sal['JobTitle'].apply(len)
correlation = sal[['TitleLength', 'TotalPayBenefits']].corr().iloc[0, 1]
correlation
[32]: -0.03687844593260901