0% found this document useful (0 votes)
22 views12 pages

Dsbda 3

The document analyzes employee data from a CSV file using pandas. It loads the data, calculates summary statistics like mean, median and mode, and displays some results. The analyses include calculating statistics across columns and down rows of the data.

Uploaded by

Arbaz Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views12 pages

Dsbda 3

The document analyzes employee data from a CSV file using pandas. It loads the data, calculates summary statistics like mean, median and mode, and displays some results. The analyses include calculating statistics across columns and down rows of the data.

Uploaded by

Arbaz Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

3/22/24, 6:39 PM Pract3

In [1]: import pandas as pd


import numpy as np
import matplotlib.pyplot as plt

In [2]: ed=pd.read_csv("Employee_Data.csv")

In [3]: ed

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/HEH3L20Y/Pract3[1].html 1/12


3/22/24, 6:39 PM Pract3

Out[3]: ID Experience_Years Age Gender Salary

0 1 5 28 Female 250000

1 2 1 21 Male 50000

2 3 3 23 Female 170000

3 4 2 22 Male 25000

4 5 1 17 Male 10000

5 6 25 62 Male 5001000

6 7 19 54 Female 800000

7 8 2 21 Female 9000

8 9 10 36 Female 61500

9 10 15 54 Female 650000

10 11 4 26 Female 250000

11 12 6 29 Male 1400000

12 13 14 39 Male 6000050

13 14 11 40 Male 220100

14 15 2 23 Male 7500

15 16 4 27 Female 87000

16 17 10 34 Female 930000

17 18 15 54 Female 7900000

18 19 2 21 Male 15000

19 20 10 36 Male 330000

20 21 15 54 Male 6570000

21 22 4 26 Male 25000

22 23 5 29 Male 6845000

23 24 1 21 Female 6000

24 25 4 23 Female 8900

25 26 3 22 Female 20000

26 27 1 18 Male 3000

27 28 27 62 Female 10000000

28 29 19 54 Female 5000000

29 30 2 21 Female 6100

30 31 10 34 Male 80000

31 32 15 54 Male 900000

32 33 20 55 Female 1540000

33 34 19 53 Female 9300000

34 35 16 49 Male 7600000

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/HEH3L20Y/Pract3[1].html 2/12


3/22/24, 6:39 PM Pract3

In [4]: ed.mean()

/tmp/ipykernel_22595/3687101760.py:1: FutureWarning: The default value of numeric_


only in DataFrame.mean is deprecated. In a future version, it will default to Fals
e. In addition, specifying 'numeric_only=None' is deprecated. Select only valid co
lumns or specify the value of numeric_only to silence this warning.
ed.mean()
ID 1.800000e+01
Out[4]:
Experience_Years 9.200000e+00
Age 3.548571e+01
Salary 2.059147e+06
dtype: float64

In [5]: ed.loc[:,'Age'].mean()

35.48571428571429
Out[5]:

In [6]: ed.mean(axis=1)[0:4]

/tmp/ipykernel_22595/676942913.py:1: FutureWarning: Dropping of nuisance columns i


n DataFrame reductions (with 'numeric_only=None') is deprecated; in a future versi
on this will raise TypeError. Select only valid columns before calling the reduct
ion.
ed.mean(axis=1)[0:4]
0 62508.50
Out[6]:
1 12506.00
2 42507.25
3 6257.00
dtype: float64

In [7]: ed.median()

/tmp/ipykernel_22595/3279234978.py:1: FutureWarning: The default value of numeric_


only in DataFrame.median is deprecated. In a future version, it will default to Fa
lse. In addition, specifying 'numeric_only=None' is deprecated. Select only valid
columns or specify the value of numeric_only to silence this warning.
ed.median()
ID 18.0
Out[7]:
Experience_Years 6.0
Age 29.0
Salary 250000.0
dtype: float64

In [8]: ed.loc[:,'Age'].median()

29.0
Out[8]:

In [9]: ed.median(axis=1)[0:10]

/tmp/ipykernel_22595/3964495804.py:1: FutureWarning: Dropping of nuisance columns


in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future vers
ion this will raise TypeError. Select only valid columns before calling the reduc
tion.
ed.median(axis=1)[0:10]

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/HEH3L20Y/Pract3[1].html 3/12


3/22/24, 6:39 PM Pract3
0 16.5
Out[9]:
1 11.5
2 13.0
3 13.0
4 11.0
5 43.5
6 36.5
7 14.5
8 23.0
9 34.5
dtype: float64

In [10]: ed.mode()

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/HEH3L20Y/Pract3[1].html 4/12


3/22/24, 6:39 PM Pract3

Out[10]: ID Experience_Years Age Gender Salary

0 1 2.0 54.0 Female 25000.0

1 2 NaN NaN NaN 250000.0

2 3 NaN NaN NaN NaN

3 4 NaN NaN NaN NaN

4 5 NaN NaN NaN NaN

5 6 NaN NaN NaN NaN

6 7 NaN NaN NaN NaN

7 8 NaN NaN NaN NaN

8 9 NaN NaN NaN NaN

9 10 NaN NaN NaN NaN

10 11 NaN NaN NaN NaN

11 12 NaN NaN NaN NaN

12 13 NaN NaN NaN NaN

13 14 NaN NaN NaN NaN

14 15 NaN NaN NaN NaN

15 16 NaN NaN NaN NaN

16 17 NaN NaN NaN NaN

17 18 NaN NaN NaN NaN

18 19 NaN NaN NaN NaN

19 20 NaN NaN NaN NaN

20 21 NaN NaN NaN NaN

21 22 NaN NaN NaN NaN

22 23 NaN NaN NaN NaN

23 24 NaN NaN NaN NaN

24 25 NaN NaN NaN NaN

25 26 NaN NaN NaN NaN

26 27 NaN NaN NaN NaN

27 28 NaN NaN NaN NaN

28 29 NaN NaN NaN NaN

29 30 NaN NaN NaN NaN

30 31 NaN NaN NaN NaN

31 32 NaN NaN NaN NaN

32 33 NaN NaN NaN NaN

33 34 NaN NaN NaN NaN

34 35 NaN NaN NaN NaN

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/HEH3L20Y/Pract3[1].html 5/12


3/22/24, 6:39 PM Pract3

In [11]: ed.loc[:,'Age'].mode()

0 54
Out[11]:
Name: Age, dtype: int64

In [12]: ed.min()

ID 1
Out[12]:
Experience_Years 1
Age 17
Gender Female
Salary 3000
dtype: object

In [13]: ed.loc[:,'Age'].min(skipna = False)

17
Out[13]:

In [14]: ed.loc[:,'Age'].min(skipna = True)

17
Out[14]:

In [15]: ed.max()

ID 35
Out[15]:
Experience_Years 27
Age 62
Gender Male
Salary 10000000
dtype: object

In [16]: ed.loc[:,'Age'].max(skipna = False)

62
Out[16]:

In [17]: ed.loc[:,'Age'].max(skipna = True)

62
Out[17]:

In [18]: ed.std()

/tmp/ipykernel_22595/3633175234.py:1: FutureWarning: The default value of numeric_


only in DataFrame.std is deprecated. In a future version, it will default to Fals
e. In addition, specifying 'numeric_only=None' is deprecated. Select only valid co
lumns or specify the value of numeric_only to silence this warning.
ed.std()
ID 1.024695e+01
Out[18]:
Experience_Years 7.552950e+00
Age 1.464355e+01
Salary 3.170124e+06
dtype: float64

In [19]: ed.loc[:,'Age'].std()

14.643551940884361
Out[19]:

In [20]: ed.std(axis=1)[0:4]

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/HEH3L20Y/Pract3[1].html 6/12


3/22/24, 6:39 PM Pract3

/tmp/ipykernel_22595/1971350628.py:1: FutureWarning: Dropping of nuisance columns


in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future vers
ion this will raise TypeError. Select only valid columns before calling the reduc
tion.
ed.std(axis=1)[0:4]
0 124994.333900
Out[20]:
1 24996.001694
2 84995.167190
3 12495.336570
dtype: float64

In [21]: ed.groupby(['Gender'])['Age'].mean()

Gender
Out[21]:
Female 37.111111
Male 33.764706
Name: Age, dtype: float64

In [22]: ed_u=ed.rename(columns= {'Salary':'Income'},inplace=False)


(ed_u.groupby(['Gender']).Income.mean())

Gender
Out[22]:
Female 2.054917e+06
Male 2.063626e+06
Name: Income, dtype: float64

In [23]: from sklearn import preprocessing


enc = preprocessing.OneHotEncoder()
enc_ed = pd.DataFrame(enc.fit_transform(ed[['Gender']]).toarray())

In [24]: enc_ed

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/HEH3L20Y/Pract3[1].html 7/12


3/22/24, 6:39 PM Pract3

Out[24]: 0 1

0 1.0 0.0

1 0.0 1.0

2 1.0 0.0

3 0.0 1.0

4 0.0 1.0

5 0.0 1.0

6 1.0 0.0

7 1.0 0.0

8 1.0 0.0

9 1.0 0.0

10 1.0 0.0

11 0.0 1.0

12 0.0 1.0

13 0.0 1.0

14 0.0 1.0

15 1.0 0.0

16 1.0 0.0

17 1.0 0.0

18 0.0 1.0

19 0.0 1.0

20 0.0 1.0

21 0.0 1.0

22 0.0 1.0

23 1.0 0.0

24 1.0 0.0

25 1.0 0.0

26 0.0 1.0

27 1.0 0.0

28 1.0 0.0

29 1.0 0.0

30 0.0 1.0

31 0.0 1.0

32 1.0 0.0

33 1.0 0.0

34 0.0 1.0

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/HEH3L20Y/Pract3[1].html 8/12


3/22/24, 6:39 PM Pract3

In [25]: ed_encode=ed_u.join(enc_ed)
ed_encode

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/HEH3L20Y/Pract3[1].html 9/12


3/22/24, 6:39 PM Pract3

Out[25]: ID Experience_Years Age Gender Income 0 1

0 1 5 28 Female 250000 1.0 0.0

1 2 1 21 Male 50000 0.0 1.0

2 3 3 23 Female 170000 1.0 0.0

3 4 2 22 Male 25000 0.0 1.0

4 5 1 17 Male 10000 0.0 1.0

5 6 25 62 Male 5001000 0.0 1.0

6 7 19 54 Female 800000 1.0 0.0

7 8 2 21 Female 9000 1.0 0.0

8 9 10 36 Female 61500 1.0 0.0

9 10 15 54 Female 650000 1.0 0.0

10 11 4 26 Female 250000 1.0 0.0

11 12 6 29 Male 1400000 0.0 1.0

12 13 14 39 Male 6000050 0.0 1.0

13 14 11 40 Male 220100 0.0 1.0

14 15 2 23 Male 7500 0.0 1.0

15 16 4 27 Female 87000 1.0 0.0

16 17 10 34 Female 930000 1.0 0.0

17 18 15 54 Female 7900000 1.0 0.0

18 19 2 21 Male 15000 0.0 1.0

19 20 10 36 Male 330000 0.0 1.0

20 21 15 54 Male 6570000 0.0 1.0

21 22 4 26 Male 25000 0.0 1.0

22 23 5 29 Male 6845000 0.0 1.0

23 24 1 21 Female 6000 1.0 0.0

24 25 4 23 Female 8900 1.0 0.0

25 26 3 22 Female 20000 1.0 0.0

26 27 1 18 Male 3000 0.0 1.0

27 28 27 62 Female 10000000 1.0 0.0

28 29 19 54 Female 5000000 1.0 0.0

29 30 2 21 Female 6100 1.0 0.0

30 31 10 34 Male 80000 0.0 1.0

31 32 15 54 Male 900000 0.0 1.0

32 33 20 55 Female 1540000 1.0 0.0

33 34 19 53 Female 9300000 1.0 0.0

34 35 16 49 Male 7600000 0.0 1.0

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/HEH3L20Y/Pract3[1].html 10/12


3/22/24, 6:39 PM Pract3

In [26]: import pandas as pd

In [27]: csv_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

In [28]: col_names =['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width','Species']

In [29]: iris = pd.read_csv(csv_url, names = col_names)

In [30]: iris

Out[30]: Sepal_Length Sepal_Width Petal_Length Petal_Width Species

0 5.1 3.5 1.4 0.2 Iris-setosa

1 4.9 3.0 1.4 0.2 Iris-setosa

2 4.7 3.2 1.3 0.2 Iris-setosa

3 4.6 3.1 1.5 0.2 Iris-setosa

4 5.0 3.6 1.4 0.2 Iris-setosa

... ... ... ... ... ...

145 6.7 3.0 5.2 2.3 Iris-virginica

146 6.3 2.5 5.0 1.9 Iris-virginica

147 6.5 3.0 5.2 2.0 Iris-virginica

148 6.2 3.4 5.4 2.3 Iris-virginica

149 5.9 3.0 5.1 1.8 Iris-virginica

150 rows × 5 columns

In [31]: irisSet = (iris['Species']== 'Iris-setosa')

In [32]: print('Iris-setosa')
print(iris[irisSet].describe())

Iris-setosa
Sepal_Length Sepal_Width Petal_Length Petal_Width
count 50.00000 50.000000 50.000000 50.00000
mean 5.00600 3.418000 1.464000 0.24400
std 0.35249 0.381024 0.173511 0.10721
min 4.30000 2.300000 1.000000 0.10000
25% 4.80000 3.125000 1.400000 0.20000
50% 5.00000 3.400000 1.500000 0.20000
75% 5.20000 3.675000 1.575000 0.30000
max 5.80000 4.400000 1.900000 0.60000

In [33]: irisVer = (iris['Species']== 'Iris-versicolor')

In [34]: print('Iris-versicolor')
print(iris[irisVer].describe())

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/HEH3L20Y/Pract3[1].html 11/12


3/22/24, 6:39 PM Pract3
Iris-versicolor
Sepal_Length Sepal_Width Petal_Length Petal_Width
count 50.000000 50.000000 50.000000 50.000000
mean 5.936000 2.770000 4.260000 1.326000
std 0.516171 0.313798 0.469911 0.197753
min 4.900000 2.000000 3.000000 1.000000
25% 5.600000 2.525000 4.000000 1.200000
50% 5.900000 2.800000 4.350000 1.300000
75% 6.300000 3.000000 4.600000 1.500000
max 7.000000 3.400000 5.100000 1.800000

In [35]: irisVir = (iris['Species']== 'Iris-virginica')

In [36]: print('Iris-virginica')
print(iris[irisVir].describe())

Iris-virginica
Sepal_Length Sepal_Width Petal_Length Petal_Width
count 50.00000 50.000000 50.000000 50.00000
mean 6.58800 2.974000 5.552000 2.02600
std 0.63588 0.322497 0.551895 0.27465
min 4.90000 2.200000 4.500000 1.40000
25% 6.22500 2.800000 5.100000 1.80000
50% 6.50000 3.000000 5.550000 2.00000
75% 6.90000 3.175000 5.875000 2.30000
max 7.90000 3.800000 6.900000 2.50000

In [ ]:

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/HEH3L20Y/Pract3[1].html 12/12

You might also like