0% found this document useful (0 votes)
31 views6 pages

Ass 1B Part1 Solution

Machine learning

Uploaded by

soumyajitpaul748
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views6 pages

Ass 1B Part1 Solution

Machine learning

Uploaded by

soumyajitpaul748
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Assignment_1B_Part_I

August 31, 2023

0.1 1. Import pandas as pd.


[1]: import pandas as pd

0.2 2. Read Salaries.csv as a dataframe called sal.


[2]: sal=pd.read_csv("Salaries.csv")
sal

/tmp/ipykernel_17800/185865281.py:1: DtypeWarning: Columns (3,4,5,6,12) have


mixed types. Specify dtype option on import or set low_memory=False.
sal=pd.read_csv("Salaries.csv")

[2]: Id EmployeeName \
0 1 NATHANIEL FORD
1 2 GARY JIMENEZ
2 3 ALBERT PARDINI
3 4 CHRISTOPHER CHONG
4 5 PATRICK GARDNER
… … …
148649 148650 Roy I Tillery
148650 148651 Not provided
148651 148652 Not provided
148652 148653 Not provided
148653 148654 Joe Lopez

JobTitle BasePay \
0 GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 167411.18
1 CAPTAIN III (POLICE DEPARTMENT) 155966.02
2 CAPTAIN III (POLICE DEPARTMENT) 212739.13
3 WIRE ROPE CABLE MAINTENANCE MECHANIC 77916.0
4 DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 134401.6
… … …
148649 Custodian 0.00
148650 Not provided Not Provided
148651 Not provided Not Provided
148652 Not provided Not Provided
148653 Counselor, Log Cabin Ranch 0.00

1
OvertimePay OtherPay Benefits TotalPay TotalPayBenefits \
0 0.0 400184.25 NaN 567595.43 567595.43
1 245131.88 137811.38 NaN 538909.28 538909.28
2 106088.18 16452.6 NaN 335279.91 335279.91
3 56120.71 198306.9 NaN 332343.61 332343.61
4 9737.0 182234.59 NaN 326373.19 326373.19
… … … … … …
148649 0.00 0.00 0.00 0.00 0.00
148650 Not Provided Not Provided Not Provided 0.00 0.00
148651 Not Provided Not Provided Not Provided 0.00 0.00
148652 Not Provided Not Provided Not Provided 0.00 0.00
148653 0.00 -618.13 0.00 -618.13 -618.13

Year Notes Agency Status


0 2011 NaN San Francisco NaN
1 2011 NaN San Francisco NaN
2 2011 NaN San Francisco NaN
3 2011 NaN San Francisco NaN
4 2011 NaN San Francisco NaN
… … … … …
148649 2014 NaN San Francisco PT
148650 2014 NaN San Francisco NaN
148651 2014 NaN San Francisco NaN
148652 2014 NaN San Francisco NaN
148653 2014 NaN San Francisco PT

[148654 rows x 13 columns]

0.3 3. Check the head of the DataFrame.


[3]: sal.head()

[3]: Id EmployeeName JobTitle \


0 1 NATHANIEL FORD GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY
1 2 GARY JIMENEZ CAPTAIN III (POLICE DEPARTMENT)
2 3 ALBERT PARDINI CAPTAIN III (POLICE DEPARTMENT)
3 4 CHRISTOPHER CHONG WIRE ROPE CABLE MAINTENANCE MECHANIC
4 5 PATRICK GARDNER DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT)

BasePay OvertimePay OtherPay Benefits TotalPay TotalPayBenefits \


0 167411.18 0.0 400184.25 NaN 567595.43 567595.43
1 155966.02 245131.88 137811.38 NaN 538909.28 538909.28
2 212739.13 106088.18 16452.6 NaN 335279.91 335279.91
3 77916.0 56120.71 198306.9 NaN 332343.61 332343.61
4 134401.6 9737.0 182234.59 NaN 326373.19 326373.19

2
Year Notes Agency Status
0 2011 NaN San Francisco NaN
1 2011 NaN San Francisco NaN
2 2011 NaN San Francisco NaN
3 2011 NaN San Francisco NaN
4 2011 NaN San Francisco NaN

0.4 4. Use the .info() method to find out how many entries there are.
[4]: sal.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 148654 entries, 0 to 148653
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 148654 non-null int64
1 EmployeeName 148654 non-null object
2 JobTitle 148654 non-null object
3 BasePay 148049 non-null object
4 OvertimePay 148654 non-null object
5 OtherPay 148654 non-null object
6 Benefits 112495 non-null object
7 TotalPay 148654 non-null float64
8 TotalPayBenefits 148654 non-null float64
9 Year 148654 non-null int64
10 Notes 0 non-null float64
11 Agency 148654 non-null object
12 Status 38119 non-null object
dtypes: float64(3), int64(2), object(8)
memory usage: 14.7+ MB

0.5 5. What is the average BasePay ?


[7]: sal['BasePay'] = pd.to_numeric(sal['BasePay'], errors='coerce') ## conversion␣
↪float to integer

print("The average Base Pay is: ",sal['BasePay'].mean())

The average Base Pay is: 66325.44884050643

0.6 6. What is the highest amount of OvertimePay in the dataset?


[8]: sal['OvertimePay'] = pd.to_numeric(sal['OvertimePay'], errors='coerce')
print("The highest amount of overtime pay in the dataset is:␣
↪",sal['OvertimePay'].max())

The highest amount of overtime pay in the dataset is: 245131.88

3
0.7 7. What is the job title of JOSEPH DRISCOLL?
[11]: print("The job title of Joseph Driscoll is: ",sal[sal['EmployeeName'] ==␣
↪'JOSEPH DRISCOLL']['JobTitle'].iloc[0])

The job title of Joseph Driscoll is: CAPTAIN, FIRE SUPPRESSION

0.8 8. How much does JOSEPH DRISCOLL make (including benefits)?


[13]: print("The Total Pay Benefits of Joseph Driscoll is:",sal[sal['EmployeeName']␣
↪== 'JOSEPH DRISCOLL']['TotalPayBenefits'].iloc[0])

The Total Pay Benefits of Joseph Driscoll is: 270324.91

0.9 9. What is the name of highest paid person (including benefits)?


[15]: print("The name of highest paid person is: ",sal[sal['TotalPayBenefits'] ==␣
↪sal['TotalPayBenefits'].max()]['EmployeeName'].iloc[0])

The name of highest paid person is: NATHANIEL FORD

0.10 10. What is the name of lowest paid person (including benefits)?
[16]: print("The name of lowest paid person is: ",sal[sal['TotalPayBenefits'] ==␣
↪sal['TotalPayBenefits'].min()]['EmployeeName'].iloc[0])

The name of lowest paid person is: Joe Lopez

0.11 11. What was the average (mean) BasePay of all employees per year?
(2011-2014) ?
[24]: sal.groupby('Year')['BasePay'].mean()

[24]: Year
2011 63595.956517
2012 65436.406857
2013 69630.030216
2014 66564.421924
Name: BasePay, dtype: float64

0.12 12. How many unique job titles are there?


[26]: print("Number of Unique job titles: ",sal['JobTitle'].nunique())

Number of Unique job titles: 2159

4
0.13 13. What are the top 5 most common jobs?
[29]: print("Top 5 most common jobs:\n",sal['JobTitle'].value_counts().head())

Top 5 most common jobs:


Transit Operator 7036
Special Nurse 4389
Registered Nurse 3736
Public Svc Aide-Public Works 2518
Police Officer 3 2421
Name: JobTitle, dtype: int64

0.14 14. How many Job Titles were represented by only one person in 2013?
(e.g. Job Titles with only one occurrence in 2013?)
[30]: sal[sal['Year'] == 2013]['JobTitle'].value_counts().eq(1).sum()

[30]: 202

0.15 15. How many people have the word Chief in their job title?
[31]: sal[sal['JobTitle'].str.contains('Chief', case=False)]['EmployeeName'].count()

[31]: 627

0.16 16. Is there a correlation between length of the Job Title string and
Salary?
[32]: sal['TitleLength'] = sal['JobTitle'].apply(len)
correlation = sal[['TitleLength', 'TotalPayBenefits']].corr().iloc[0, 1]
correlation

[32]: -0.03687844593260901

You might also like