0% found this document useful (0 votes)
3 views

Assignment Ds Midterm

Uploaded by

sadaamabdi993
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Assignment Ds Midterm

Uploaded by

sadaamabdi993
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

In [3]: import numpy as np

import pandas as pd
import matplotlib.pyplot as plt

In [5]: df = pd.read_excel('C:/Users/hp/Desktop/juppyter/assign/Employee_Dataset_Pandas.xlsx')
df

Out[5]: Employee_ID Name Age Department Salary Joining_Date

0 E001 Laila Hussein 49 Operations 76794.67 2023-12-21

1 E002 Omar Abdullahi 59 IT 58597.15 2016-07-07

2 E003 NaN 23 Finance 53918.98 2023-05-09

3 E004 Abdullah Osman 50 Operations 46909.28 2010-07-17

4 E005 Sara Ismail 40 HR 500000.00 2017-01-13

... ... ... ... ... ... ...

97 E098 Sagal Ibrahim 44 Marketing 25848.25 2019-12-12

98 E099 Sara Abubakar 25 Marketing 21662.88 2010-11-14

99 E100 Hassan Abubakar 52 Operations 58625.36 2010-04-18

100 E001 Laila Hussein 49 Operations 76794.67 2023-12-21

101 E002 Omar Abdullahi 59 IT NaN 2016-07-07

102 rows × 6 columns

1. Average Salary by Department:

In [8]: avg_salary_by_Department = df.groupby('Department')['Salary'].mean()


df

Out[8]: Employee_ID Name Age Department Salary Joining_Date

0 E001 Laila Hussein 49 Operations 76794.67 2023-12-21

1 E002 Omar Abdullahi 59 IT 58597.15 2016-07-07

2 E003 NaN 23 Finance 53918.98 2023-05-09

3 E004 Abdullah Osman 50 Operations 46909.28 2010-07-17

4 E005 Sara Ismail 40 HR 500000.00 2017-01-13

... ... ... ... ... ... ...

97 E098 Sagal Ibrahim 44 Marketing 25848.25 2019-12-12

98 E099 Sara Abubakar 25 Marketing 21662.88 2010-11-14

99 E100 Hassan Abubakar 52 Operations 58625.36 2010-04-18

100 E001 Laila Hussein 49 Operations 76794.67 2023-12-21

101 E002 Omar Abdullahi 59 IT NaN 2016-07-07

102 rows × 6 columns

In [9]: # Display avg_salary_by_Department

print (avg_salary_by_Department)

Department
Finance 55309.961111
HR 83892.270714
IT 54651.742500
Marketing 47693.682222
Operations 58020.254286
Sales 51101.622222
Name: Salary, dtype: float64

2. Employees_Joining_Date Each Year

In [6]: #df['Joining_Year'] = pd.to_datetime(df['Joining_Date']).dt.Year


#Employees_Each_Year = df.groupby('Joining_Year').size().rest_index(name= 'Employee_count')
#print(Employees_Each_Year)
df['Joining_Date'] = pd.to_datetime(df['Joining_Date'], errors='coerce')
df['JoinYear'] = df['Joining_Date'].dt.year
employees_joined_year = df['JoinYear'].value_counts().sort_index()
print(employees_joined_year)

JoinYear
2010.0 15
2011.0 5
2012.0 8
2013.0 4
2014.0 9
2015.0 9
2016.0 9
2017.0 5
2018.0 5
2019.0 8
2020.0 8
2021.0 5
2022.0 6
2023.0 5
Name: count, dtype: int64

3. Salary Distribution by Department, you can visualize any

chart

In [5]: data = pd.read_excel('Employee_Dataset_Pandas.xlsx')

In [7]: plt.plot(data['Department'],data ['Salary'], marker= 'o', linestyle = '-' , color= 'yellow')


plt.xlabel("Department")
plt.ylabel("Salary")

Out[7]: Text(0, 0.5, 'Salary')

4. Age vs. Salary (Scatter Plot) or Line chart

In [54]: plt.scatter(df['Age'], df['Salary'], alpha=0.7)


plt.title('Age vs Salary')
plt.show()

5. Gender Distribution by Department (if "Gender" is available)

In [59]: Age_Distribution = df.groupby(['Department','Age']).size().unstack()


print(Age_Distribution )

Age 22 23 24 25 26 27 28 29 31 32 ... 49 50 \
Department ...
Finance NaN 1.0 NaN 1.0 NaN 2.0 NaN NaN NaN 1.0 ... NaN NaN
HR NaN NaN NaN 1.0 NaN 1.0 1.0 NaN 1.0 NaN ... NaN NaN
IT 1.0 NaN 1.0 1.0 1.0 NaN NaN 1.0 NaN NaN ... NaN 1.0
Marketing NaN 1.0 NaN 2.0 NaN 1.0 1.0 NaN NaN 1.0 ... NaN 1.0
Operations NaN NaN NaN 1.0 1.0 1.0 1.0 1.0 NaN NaN ... 4.0 1.0
Sales NaN 2.0 1.0 1.0 NaN 2.0 NaN NaN NaN 1.0 ... NaN NaN

Age 51 52 54 55 57 58 59 60
Department
Finance NaN NaN NaN NaN NaN NaN 1.0 NaN
HR 2.0 1.0 1.0 1.0 1.0 2.0 NaN NaN
IT 1.0 2.0 1.0 NaN NaN NaN 2.0 NaN
Marketing 1.0 NaN 1.0 NaN 1.0 1.0 NaN 1.0
Operations NaN 1.0 2.0 NaN 1.0 NaN 1.0 NaN
Sales NaN 1.0 NaN NaN NaN NaN 2.0 2.0

[6 rows x 35 columns]

This result shows the quantity of male and female representatives in every division, assisting with grasping orientation portrayal inside divisions.

6. Department with Highest Average Age

In [69]: Average_Age_by_Department = df.groupby('Department')['Age'].mean()


highest_Average_Depatment = Average_Age_by_Department .idxmax()
highest_Average_Age = Average_Age_by_Department.max()
print(f'Department with the highest average age: {highest_Average_Depatment} (Average Age: {highest_Average_Age })')

Department with the highest average age: HR (Average Age: 44.357142857142854)

7. Top 5 Highest-Paid Employees and Their Departments

In [70]: top_5_highest_paid = df[['Employee_ID' ,'Department','Salary']]


print(top_5_highest_paid)

Employee_ID Department Salary


0 E001 Operations 76794.67
1 E002 IT 58597.15
2 E003 Finance 53918.98
3 E004 Operations 46909.28
4 E005 HR 500000.00
.. ... ... ...
97 E098 Marketing 25848.25
98 E099 Marketing 21662.88
99 E100 Operations 58625.36
100 E001 Operations 76794.67
101 E002 IT NaN

[102 rows x 3 columns]

You might also like