0% found this document useful (0 votes)
18 views11 pages

UQ21CA632B Unit2 Class12&13 Pandas Basics

The document provides a tutorial on basic operations using the Pandas library in Python, specifically focusing on reading and inspecting a dataset containing salary information. It covers various methods for data retrieval, grouping, and filtering based on conditions, using a dataset with columns such as rank, discipline, years since PhD, years of service, sex, and salary. Key functions demonstrated include reading CSV files, inspecting data with head and tail, and performing group and filter operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views11 pages

UQ21CA632B Unit2 Class12&13 Pandas Basics

The document provides a tutorial on basic operations using the Pandas library in Python, specifically focusing on reading and inspecting a dataset containing salary information. It covers various methods for data retrieval, grouping, and filtering based on conditions, using a dataset with columns such as rank, discipline, years since PhD, years of service, sex, and salary. Key functions demonstrated include reading CSV files, inspecting data with head and tail, and performing group and filter operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

7/20/2021 Pandas Basics

Reading the Dataset


In [1]:
import pandas as pd

In [2]:
data=pd.read_csv("Salaries.csv")

Initial Inspection
In [3]:
data.head()

Out[3]: Unnamed: 0 rank discipline yrs.since.phd yrs.service sex salary

0 1 Prof B 19 18 Male 139750

1 2 Prof B 20 16 Male 173200

2 3 AsstProf B 4 3 Male 79750

3 4 Prof B 45 39 Male 115000

4 5 Prof B 40 41 Male 141500

In [4]:
data.head(10)

Out[4]: Unnamed: 0 rank discipline yrs.since.phd yrs.service sex salary

0 1 Prof B 19 18 Male 139750

1 2 Prof B 20 16 Male 173200

2 3 AsstProf B 4 3 Male 79750

3 4 Prof B 45 39 Male 115000

4 5 Prof B 40 41 Male 141500

5 6 AssocProf B 6 6 Male 97000

6 7 Prof B 30 23 Male 175000

7 8 Prof B 45 45 Male 147765

8 9 Prof B 21 20 Male 119250

9 10 Prof B 18 18 Female 129000

In [5]:
data.tail(10)

Out[5]: Unnamed: 0 rank discipline yrs.since.phd yrs.service sex salary

387 388 Prof A 29 15 Male 109305

388 389 Prof A 38 36 Male 119450

389 390 Prof A 33 18 Male 186023

390 391 Prof A 40 19 Male 166605

localhost:8888/nbconvert/html/Desktop/DV Lab/Pandas Basics.ipynb?download=false 1/11


7/20/2021 Pandas Basics

Unnamed: 0 rank discipline yrs.since.phd yrs.service sex salary

391 392 Prof A 30 19 Male 151292

392 393 Prof A 33 30 Male 103106

393 394 Prof A 31 19 Male 150564

394 395 Prof A 42 25 Male 101738

395 396 Prof A 25 15 Male 95329

396 397 AsstProf A 8 4 Male 81035

In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 397 entries, 0 to 396
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 397 non-null int64
1 rank 397 non-null object
2 discipline 397 non-null object
3 yrs.since.phd 397 non-null int64
4 yrs.service 397 non-null int64
5 sex 397 non-null object
6 salary 397 non-null int64
dtypes: int64(4), object(3)
memory usage: 21.8+ KB

In [7]:
data.columns

Out[7]: Index(['Unnamed: 0', 'rank', 'discipline', 'yrs.since.phd', 'yrs.service',


'sex', 'salary'],
dtype='object')

In [8]:
data.index

Out[8]: RangeIndex(start=0, stop=397, step=1)

In [9]:
data.shape

Out[9]: (397, 7)

In [10]:
data.ndim

Out[10]: 2

In [11]:
data.describe() # will work for numerical data

Out[11]: Unnamed: 0 yrs.since.phd yrs.service salary

count 397.000000 397.000000 397.000000 397.000000

mean 199.000000 22.314861 17.614610 113706.458438

std 114.748275 12.887003 13.006024 30289.038695

localhost:8888/nbconvert/html/Desktop/DV Lab/Pandas Basics.ipynb?download=false 2/11


7/20/2021 Pandas Basics

Unnamed: 0 yrs.since.phd yrs.service salary

min 1.000000 1.000000 0.000000 57800.000000

25% 100.000000 12.000000 7.000000 91000.000000

50% 199.000000 21.000000 16.000000 107300.000000

75% 298.000000 32.000000 27.000000 134185.000000

max 397.000000 56.000000 60.000000 231545.000000

Retrieval of Rows and Columns


In [12]:
data[['salary']]

Out[12]: salary

0 139750

1 173200

2 79750

3 115000

4 141500

... ...

392 103106

393 150564

394 101738

395 95329

396 81035

397 rows × 1 columns

In [13]:
data.iloc[:,6]

Out[13]: 0 139750
1 173200
2 79750
3 115000
4 141500
...
392 103106
393 150564
394 101738
395 95329
396 81035
Name: salary, Length: 397, dtype: int64

In [14]:
df=data.iloc[10:16,2:4]

In [15]:
df

localhost:8888/nbconvert/html/Desktop/DV Lab/Pandas Basics.ipynb?download=false 3/11


7/20/2021 Pandas Basics

Out[15]: discipline yrs.since.phd

10 B 12

11 B 7

12 B 1

13 B 2

14 B 20

15 B 12

In [28]:
d1=data[['discipline','rank']]
d1

Out[28]: discipline rank

0 B Prof

1 B Prof

2 B AsstProf

3 B Prof

4 B Prof

... ... ...

392 A Prof

393 A Prof

394 A Prof

395 A Prof

396 A AsstProf

397 rows × 2 columns

In [30]:
data.iloc[:,2]

Out[30]: 0 B
1 B
2 B
3 B
4 B
..
392 A
393 A
394 A
395 A
396 A
Name: discipline, Length: 397, dtype: object

Grouping of Data
In [33]:
k=data.groupby('discipline')

localhost:8888/nbconvert/html/Desktop/DV Lab/Pandas Basics.ipynb?download=false 4/11


7/20/2021 Pandas Basics

In [34]: k.groups.keys()

Out[34]: dict_keys(['A', 'B'])

In [35]:
d2=k.get_group('A')
d2.head()

Out[35]: Unnamed: 0 rank discipline yrs.since.phd yrs.service sex salary

17 18 Prof A 38 34 Male 103450

18 19 Prof A 37 23 Male 124750

19 20 Prof A 39 36 Female 137000

20 21 Prof A 31 26 Male 89565

21 22 Prof A 36 31 Male 102580

In [36]:
d2.count()

Out[36]: Unnamed: 0 181


rank 181
discipline 181
yrs.since.phd 181
yrs.service 181
sex 181
salary 181
dtype: int64

In [37]:
len(d2)

Out[37]: 181

In [39]:
d2.shape

Out[39]: (181, 7)

Filtering of records based on condition


In [43]:
d3=data[data['discipline']=='A']

In [44]:
len(d3)

Out[44]: 181

In [45]:
data[data['yrs.service']>30]

Out[45]: Unnamed: 0 rank discipline yrs.since.phd yrs.service sex salary

3 4 Prof B 45 39 Male 115000

4 5 Prof B 40 41 Male 141500

7 8 Prof B 45 45 Male 147765

localhost:8888/nbconvert/html/Desktop/DV Lab/Pandas Basics.ipynb?download=false 5/11


7/20/2021 Pandas Basics

Unnamed: 0 rank discipline yrs.since.phd yrs.service sex salary

17 18 Prof A 38 34 Male 103450

19 20 Prof A 39 36 Female 137000

... ... ... ... ... ... ... ...

365 366 Prof A 43 40 Male 101036

369 370 Prof A 33 31 Male 134690

378 379 Prof A 38 38 Male 150680

383 384 Prof A 44 44 Male 105000

388 389 Prof A 38 36 Male 119450

70 rows × 7 columns

In [48]:
f=(data['discipline']=='A') & (data['yrs.service']>30)

Out[48]: 0 False
1 False
2 False
3 False
4 False
...
392 False
393 False
394 False
395 False
396 False
Length: 397, dtype: bool

In [49]:
data[f]

Out[49]: Unnamed: 0 rank discipline yrs.since.phd yrs.service sex salary

17 18 Prof A 38 34 Male 103450

19 20 Prof A 39 36 Female 137000

21 22 Prof A 36 31 Male 102580

109 110 Prof A 40 31 Male 131205

113 114 Prof A 37 37 Male 104279

117 118 Prof A 39 36 Male 117515

121 122 Prof A 32 32 Male 124309

125 126 Prof A 54 49 Male 78162

131 132 Prof A 56 57 Male 76840

224 225 Prof A 38 35 Male 87800

229 230 Prof A 39 38 Male 133900

238 239 Prof A 46 40 Male 77202

242 243 Prof A 38 37 Male 102600

250 251 Prof A 39 39 Male 109000

localhost:8888/nbconvert/html/Desktop/DV Lab/Pandas Basics.ipynb?download=false 6/11


7/20/2021 Pandas Basics

Unnamed: 0 rank discipline yrs.since.phd yrs.service sex salary

260 261 AssocProf A 41 33 Male 88600

261 262 Prof A 45 45 Male 107550

263 264 Prof A 31 31 Male 126000

264 265 Prof A 37 35 Male 99000

266 267 Prof A 43 43 Male 143940

268 269 Prof A 47 44 Male 89650

270 271 Prof A 42 40 Male 143250

276 277 Prof A 52 48 Male 107200

279 280 Prof A 46 46 Male 100600

280 281 Prof A 39 38 Male 136500

282 283 Prof A 51 51 Male 57800

283 284 Prof A 45 43 Male 155865

285 286 AssocProf A 49 49 Male 81800

295 296 Prof A 40 36 Male 97150

298 299 Prof A 49 43 Male 72300

299 300 AssocProf A 45 39 Male 70700

300 301 Prof A 39 36 Male 88600

304 305 Prof A 46 44 Male 144050

305 306 Prof A 33 31 Male 111350

313 314 Prof A 35 35 Male 100351

356 357 Prof A 49 40 Male 88709

357 358 Prof A 39 35 Male 107309

364 365 Prof A 43 43 Male 205500

365 366 Prof A 43 40 Male 101036

369 370 Prof A 33 31 Male 134690

378 379 Prof A 38 38 Male 150680

383 384 Prof A 44 44 Male 105000

388 389 Prof A 38 36 Male 119450

What is the average service of Professor?


In [50]:
d4=data[['yrs.service','rank']]
d4.head()

Out[50]: yrs.service rank

0 18 Prof

1 16 Prof

localhost:8888/nbconvert/html/Desktop/DV Lab/Pandas Basics.ipynb?download=false 7/11


7/20/2021 Pandas Basics

yrs.service rank

2 3 AsstProf

3 39 Prof

4 41 Prof

In [52]:
d4n=d4[d4['rank']=='Prof']

In [53]:
d4n.count()

Out[53]: yrs.service 266


rank 266
dtype: int64

In [54]:
d4n['yrs.service'].mean()

Out[54]: 22.81578947368421

In [55]:
d4n

Out[55]: yrs.service rank

0 18 Prof

1 16 Prof

3 39 Prof

4 41 Prof

6 23 Prof

... ... ...

391 19 Prof

392 30 Prof

393 19 Prof

394 25 Prof

395 15 Prof

266 rows × 2 columns

In [56]:
d4n.index

Out[56]: Int64Index([ 0, 1, 3, 4, 6, 7, 8, 9, 14, 15,


...
386, 387, 388, 389, 390, 391, 392, 393, 394, 395],
dtype='int64', length=266)

In [57]:
d4n.set_index('rank')

Out[57]: yrs.service

localhost:8888/nbconvert/html/Desktop/DV Lab/Pandas Basics.ipynb?download=false 8/11


7/20/2021 Pandas Basics

rank yrs.service

rank

Prof 18

Prof 16

Prof 39

Prof 41

Prof 23

... ...

Prof 19

Prof 30

Prof 19

Prof 25

Prof 15

266 rows × 1 columns

Adding and Deleting of Columns


In [58]:
del data['Unnamed: 0']

In [59]:
data

Out[59]: rank discipline yrs.since.phd yrs.service sex salary

0 Prof B 19 18 Male 139750

1 Prof B 20 16 Male 173200

2 AsstProf B 4 3 Male 79750

3 Prof B 45 39 Male 115000

4 Prof B 40 41 Male 141500

... ... ... ... ... ... ...

392 Prof A 33 30 Male 103106

393 Prof A 31 19 Male 150564

394 Prof A 42 25 Male 101738

395 Prof A 25 15 Male 95329

396 AsstProf A 8 4 Male 81035

397 rows × 6 columns

In [64]:
data.drop('rank',axis=1)

localhost:8888/nbconvert/html/Desktop/DV Lab/Pandas Basics.ipynb?download=false 9/11


7/20/2021 Pandas Basics

Out[64]: discipline yrs.since.phd yrs.service sex salary

0 B 19 18 Male 139750

1 B 20 16 Male 173200

2 B 4 3 Male 79750

3 B 45 39 Male 115000

4 B 40 41 Male 141500

... ... ... ... ... ...

392 A 33 30 Male 103106

393 A 31 19 Male 150564

394 A 42 25 Male 101738

395 A 25 15 Male 95329

396 A 8 4 Male 81035

397 rows × 5 columns

In [65]:
data['New rank']=d4['rank']

In [66]:
data

Out[66]: rank discipline yrs.since.phd yrs.service sex salary New rank

0 Prof B 19 18 Male 139750 Prof

1 Prof B 20 16 Male 173200 Prof

2 AsstProf B 4 3 Male 79750 AsstProf

3 Prof B 45 39 Male 115000 Prof

4 Prof B 40 41 Male 141500 Prof

... ... ... ... ... ... ... ...

392 Prof A 33 30 Male 103106 Prof

393 Prof A 31 19 Male 150564 Prof

394 Prof A 42 25 Male 101738 Prof

395 Prof A 25 15 Male 95329 Prof

396 AsstProf A 8 4 Male 81035 AsstProf

397 rows × 7 columns

In [69]:
data['A']=range(0,397,1)

In [70]:
data

Out[70]: rank discipline yrs.since.phd yrs.service sex salary New rank A

localhost:8888/nbconvert/html/Desktop/DV Lab/Pandas Basics.ipynb?download=false 10/11


7/20/2021 Pandas Basics

rank discipline yrs.since.phd yrs.service sex salary New rank A

0 Prof B 19 18 Male 139750 Prof 0

1 Prof B 20 16 Male 173200 Prof 1

2 AsstProf B 4 3 Male 79750 AsstProf 2

3 Prof B 45 39 Male 115000 Prof 3

4 Prof B 40 41 Male 141500 Prof 4

... ... ... ... ... ... ... ... ...

392 Prof A 33 30 Male 103106 Prof 392

393 Prof A 31 19 Male 150564 Prof 393

394 Prof A 42 25 Male 101738 Prof 394

395 Prof A 25 15 Male 95329 Prof 395

396 AsstProf A 8 4 Male 81035 AsstProf 396

397 rows × 8 columns

In [ ]:

localhost:8888/nbconvert/html/Desktop/DV Lab/Pandas Basics.ipynb?download=false 11/11

You might also like