0% found this document useful (0 votes)
9 views

DSBDA3 - Jupyter Notebook

The document is a Jupyter Notebook containing practical exercises focused on analyzing an Employee Salary Dataset using Python libraries such as pandas, numpy, and matplotlib. It includes operations to calculate mean, median, mode, minimum, maximum, and standard deviation for various attributes like experience, age, and salary. Additionally, it demonstrates data grouping and one-hot encoding for the salary variable.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

DSBDA3 - Jupyter Notebook

The document is a Jupyter Notebook containing practical exercises focused on analyzing an Employee Salary Dataset using Python libraries such as pandas, numpy, and matplotlib. It includes operations to calculate mean, median, mode, minimum, maximum, and standard deviation for various attributes like experience, age, and salary. Additionally, it demonstrates data grouping and one-hot encoding for the salary variable.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

3/25/25, 2:05 AM DSBDA3 - Jupyter Notebook

Name :- Maithili Kishor Narkhede


Roll No. :- COTA28

# Practical 3
In [1]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [38]: df = pd.read_csv( "Employee_Salary_Dataset.csv")


df.head()

Out[38]:
ID Experience_Years Age Gender Salary

0 1 5 28 Female 250000

1 2 1 21 Male 50000

2 3 3 23 Female 170000

3 4 2 22 Male 25000

4 5 1 17 Male 10000

In [5]: df.mean(numeric_only=True)

Out[5]: ID 1.800000e+01
Experience_Years 9.200000e+00
Age 3.548571e+01
Salary 2.059147e+06
dtype: float64

In [6]: df.loc[:,'Age'].mean()

Out[6]: 35.48571428571429

In [8]: df.mean( axis = 1, numeric_only=True )[ 0 : 4 ]

Out[8]: 0 62508.50
1 12506.00
2 42507.25
3 6257.00
dtype: float64

In [10]: df.median(numeric_only=True)

Out[10]: ID 18.0
Experience_Years 6.0
Age 29.0
Salary 250000.0
dtype: float64

localhost:8891/notebooks/DSBDA/DSBDA3.ipynb 1/12
3/25/25, 2:05 AM DSBDA3 - Jupyter Notebook

In [11]: df.loc[: ,'Age'].median(numeric_only=True)

Out[11]: 29.0

In [12]: df.median(axis=1,numeric_only=True)[0:4]

Out[12]: 0 16.5
1 11.5
2 13.0
3 13.0
dtype: float64

localhost:8891/notebooks/DSBDA/DSBDA3.ipynb 2/12
3/25/25, 2:05 AM DSBDA3 - Jupyter Notebook

In [13]: df.mode()

Out[13]:
ID Experience_Years Age Gender Salary

0 1 2.0 54.0 Female 25000.0

1 2 NaN NaN NaN 250000.0

2 3 NaN NaN NaN NaN

3 4 NaN NaN NaN NaN

4 5 NaN NaN NaN NaN

5 6 NaN NaN NaN NaN

6 7 NaN NaN NaN NaN

7 8 NaN NaN NaN NaN

8 9 NaN NaN NaN NaN

9 10 NaN NaN NaN NaN

10 11 NaN NaN NaN NaN

11 12 NaN NaN NaN NaN

12 13 NaN NaN NaN NaN

13 14 NaN NaN NaN NaN

14 15 NaN NaN NaN NaN

15 16 NaN NaN NaN NaN

16 17 NaN NaN NaN NaN

17 18 NaN NaN NaN NaN

18 19 NaN NaN NaN NaN

19 20 NaN NaN NaN NaN

20 21 NaN NaN NaN NaN

21 22 NaN NaN NaN NaN

22 23 NaN NaN NaN NaN

23 24 NaN NaN NaN NaN

24 25 NaN NaN NaN NaN

25 26 NaN NaN NaN NaN

26 27 NaN NaN NaN NaN

27 28 NaN NaN NaN NaN

28 29 NaN NaN NaN NaN

29 30 NaN NaN NaN NaN

30 31 NaN NaN NaN NaN

31 32 NaN NaN NaN NaN

32 33 NaN NaN NaN NaN

33 34 NaN NaN NaN NaN

34 35 NaN NaN NaN NaN

localhost:8891/notebooks/DSBDA/DSBDA3.ipynb 3/12
3/25/25, 2:05 AM DSBDA3 - Jupyter Notebook

In [14]: df.loc[:,'Age'].mode()

Out[14]: 0 54
Name: Age, dtype: int64

In [16]: df.select_dtypes(include=[np.number]).mode(axis=1)[0:4]

Out[16]:
0 1 2 3

0 1.0 5.0 28.0 250000.0

1 1.0 2.0 21.0 50000.0

2 3.0 NaN NaN NaN

3 2.0 4.0 22.0 25000.0

In [17]: df.min()

Out[17]: ID 1
Experience_Years 1
Age 17
Gender Female
Salary 3000
dtype: object

In [18]: df.loc[:,'Salary'].min(skipna = False)

Out[18]: 3000

In [19]: df.max()

Out[19]: ID 35
Experience_Years 27
Age 62
Gender Male
Salary 10000000
dtype: object

In [20]: df.loc[:,'Salary'].max(skipna = False)

Out[20]: 10000000

In [21]: df.std(numeric_only=True)

Out[21]: ID 1.024695e+01
Experience_Years 7.552950e+00
Age 1.464355e+01
Salary 3.170124e+06
dtype: float64

In [22]: df.loc[:,'Age'].std()

Out[22]: 14.643551940884361

localhost:8891/notebooks/DSBDA/DSBDA3.ipynb 4/12
3/25/25, 2:05 AM DSBDA3 - Jupyter Notebook

In [24]: df.std(axis=1,numeric_only=True)[0:4]

Out[24]: 0 124994.333900
1 24996.001694
2 84995.167190
3 12495.336570
dtype: float64

In [25]: df.groupby(['Salary'])['Age'].mean()

Out[25]: Salary
3000 18.0
6000 21.0
6100 21.0
7500 23.0
8900 23.0
9000 21.0
10000 17.0
15000 21.0
20000 22.0
25000 24.0
50000 21.0
61500 36.0
80000 34.0
87000 27.0
170000 23.0
220100 40.0
250000 27.0
330000 36.0
650000 54.0
800000 54.0
900000 54.0
930000 34.0
1400000 29.0
1540000 55.0
5000000 54.0
5001000 62.0
6000050 39.0
6570000 54.0
6845000 29.0
7600000 49.0
7900000 54.0
9300000 53.0
10000000 62.0
Name: Age, dtype: float64

In [26]: df_u=df.rename(columns = {'healthy_eating)':'Age'},inplace =False)

localhost:8891/notebooks/DSBDA/DSBDA3.ipynb 5/12
3/25/25, 2:05 AM DSBDA3 - Jupyter Notebook

In [27]: (df_u.groupby(['Salary']).Age.mean())

Out[27]: Salary
3000 18.0
6000 21.0
6100 21.0
7500 23.0
8900 23.0
9000 21.0
10000 17.0
15000 21.0
20000 22.0
25000 24.0
50000 21.0
61500 36.0
80000 34.0
87000 27.0
170000 23.0
220100 40.0
250000 27.0
330000 36.0
650000 54.0
800000 54.0
900000 54.0
930000 34.0
1400000 29.0
1540000 55.0
5000000 54.0
5001000 62.0
6000050 39.0
6570000 54.0
6845000 29.0
7600000 49.0
7900000 54.0
9300000 53.0
10000000 62.0
Name: Age, dtype: float64

localhost:8891/notebooks/DSBDA/DSBDA3.ipynb 6/12
3/25/25, 2:05 AM DSBDA3 - Jupyter Notebook

In [28]: from sklearn import preprocessing


enc = preprocessing.OneHotEncoder()
enc_df = pd.DataFrame(enc.fit_transform(df[['Salary']]).toarray())
enc_df

localhost:8891/notebooks/DSBDA/DSBDA3.ipynb 7/12
3/25/25, 2:05 AM DSBDA3 - Jupyter Notebook

Out[28]:
0 1 2 3 4 5 6 7 8 9 ... 23 24 25 26 27 28 29 30 31

0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

4 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0

6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

7 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

11 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

12 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0

13 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

14 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

17 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0

18 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

19 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

20 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

21 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

22 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

23 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

24 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

25 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

26 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

27 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

28 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

29 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

30 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

31 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

32 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

33 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0

34 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0

35 rows × 33 columns

localhost:8891/notebooks/DSBDA/DSBDA3.ipynb 8/12
3/25/25, 2:05 AM DSBDA3 - Jupyter Notebook

In [29]: print(df.columns)

Index(['ID', 'Experience_Years', 'Age', 'Gender', 'Salary'], dtype='objec


t')

localhost:8891/notebooks/DSBDA/DSBDA3.ipynb 9/12
3/25/25, 2:05 AM DSBDA3 - Jupyter Notebook

In [31]: df_encode = df_u.join(enc_df)


df_encode

localhost:8891/notebooks/DSBDA/DSBDA3.ipynb 10/12
3/25/25, 2:05 AM DSBDA3 - Jupyter Notebook

Out[31]:
ID Experience_Years Age Gender Salary 0 1 2 3 4 ... 23 24 25 2

0 1 5 28 Female 250000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

1 2 1 21 Male 50000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

2 3 3 23 Female 170000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

3 4 2 22 Male 25000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

4 5 1 17 Male 10000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

5 6 25 62 Male 5001000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0

6 7 19 54 Female 800000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

7 8 2 21 Female 9000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

8 9 10 36 Female 61500 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

9 10 15 54 Female 650000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

10 11 4 26 Female 250000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

11 12 6 29 Male 1400000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

12 13 14 39 Male 6000050 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 1

13 14 11 40 Male 220100 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

14 15 2 23 Male 7500 0.0 0.0 0.0 1.0 0.0 ... 0.0 0.0 0.0 0

15 16 4 27 Female 87000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

16 17 10 34 Female 930000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

17 18 15 54 Female 7900000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

18 19 2 21 Male 15000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

19 20 10 36 Male 330000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

20 21 15 54 Male 6570000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

21 22 4 26 Male 25000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

22 23 5 29 Male 6845000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

23 24 1 21 Female 6000 0.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

24 25 4 23 Female 8900 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0

25 26 3 22 Female 20000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

26 27 1 18 Male 3000 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

27 28 27 62 Female 10000000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

28 29 19 54 Female 5000000 0.0 0.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0

29 30 2 21 Female 6100 0.0 0.0 1.0 0.0 0.0 ... 0.0 0.0 0.0 0

30 31 10 34 Male 80000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

31 32 15 54 Male 900000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

32 33 20 55 Female 1540000 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0

33 34 19 53 Female 9300000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

34 35 16 49 Male 7600000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0

35 rows × 38 columns

localhost:8891/notebooks/DSBDA/DSBDA3.ipynb 11/12
3/25/25, 2:05 AM DSBDA3 - Jupyter Notebook

In [32]: iris = pd.read_csv("Iris.csv")

In [33]: import pandas as pd

In [34]: irisSet=(iris['Species']=='Iris-setosa')
print('Iris-setosa')
print(iris[irisSet].describe())

Iris-setosa
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
count 50.00000 50.00000 50.000000 50.000000 50.00000
mean 25.50000 5.00600 3.418000 1.464000 0.24400
std 14.57738 0.35249 0.381024 0.173511 0.10721
min 1.00000 4.30000 2.300000 1.000000 0.10000
25% 13.25000 4.80000 3.125000 1.400000 0.20000
50% 25.50000 5.00000 3.400000 1.500000 0.20000
75% 37.75000 5.20000 3.675000 1.575000 0.30000
max 50.00000 5.80000 4.400000 1.900000 0.60000

In [35]: irisVer = (iris['Species']== 'Iris-setosa')


print('Iris-setosa')
print(iris[irisVer].describe())

Iris-setosa
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
count 50.00000 50.00000 50.000000 50.000000 50.00000
mean 25.50000 5.00600 3.418000 1.464000 0.24400
std 14.57738 0.35249 0.381024 0.173511 0.10721
min 1.00000 4.30000 2.300000 1.000000 0.10000
25% 13.25000 4.80000 3.125000 1.400000 0.20000
50% 25.50000 5.00000 3.400000 1.500000 0.20000
75% 37.75000 5.20000 3.675000 1.575000 0.30000
max 50.00000 5.80000 4.400000 1.900000 0.60000

In [36]: irisVir = (iris['Species']== 'Iris-setosa')


print('Iris-setosa')
print(iris[irisVir].describe())

Iris-setosa
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
count 50.00000 50.00000 50.000000 50.000000 50.00000
mean 25.50000 5.00600 3.418000 1.464000 0.24400
std 14.57738 0.35249 0.381024 0.173511 0.10721
min 1.00000 4.30000 2.300000 1.000000 0.10000
25% 13.25000 4.80000 3.125000 1.400000 0.20000
50% 25.50000 5.00000 3.400000 1.500000 0.20000
75% 37.75000 5.20000 3.675000 1.575000 0.30000
max 50.00000 5.80000 4.400000 1.900000 0.60000

In [ ]: ​

localhost:8891/notebooks/DSBDA/DSBDA3.ipynb 12/12

You might also like