0% found this document useful (0 votes)
196 views40 pages

Panda Programs

The document discusses Pandas data series and data frames. It provides code examples to: 1) Create a Pandas series from a list and display it. 2) Convert a Pandas series to a Python list and check its type. 3) Perform basic math operations on two Pandas series. 4) Convert a NumPy array to a Pandas series. It also provides code examples to: 1) Create a Pandas data frame from a dictionary and set index labels. 2) Change values in a data frame column. 3) Add a new column to an existing data frame. 4) Get the column headers from a data frame.

Uploaded by

friend
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
196 views40 pages

Panda Programs

The document discusses Pandas data series and data frames. It provides code examples to: 1) Create a Pandas series from a list and display it. 2) Convert a Pandas series to a Python list and check its type. 3) Perform basic math operations on two Pandas series. 4) Convert a NumPy array to a Pandas series. It also provides code examples to: 1) Create a Pandas data frame from a dictionary and set index labels. 2) Change values in a data frame column. 3) Add a new column to an existing data frame. 4) Get the column headers from a data frame.

Uploaded by

friend
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 40

--------->>>>>Pandas Data series:

1. Write a Pandas program to create and display a one-dimensional array-like object containing an
array of data

Code:

import pandas as pd
ds = pd.Series([2, 4, 6, 8, 10])
print(ds)
Sample Output:

0 2
1 4
2 6
3 8
4 10
dtype: int64

2. Write a Pandas program to convert a Panda module Series to Python list and it’s type

Code:

import pandas as pd
ds = pd.Series([2, 4, 6, 8, 10])
print("Pandas Series and type")
print(ds)
print(type(ds))
print("Convert Pandas Series to Python list")
print(ds.tolist())
print(type(ds.tolist()))
Sample Output:
Pandas Series and type
0 2
1 4
2 6
3 8
4 10
dtype: int64
<class 'pandas.core.series.Series'>
Convert Pandas Series to Python list
[2, 4, 6, 8, 10]
<class 'list'>

3. Write a Pandas program to add, subtract, multiple and divide two Pandas Series
Sample Series: [2, 4, 6, 8, 10], [1, 3, 5, 7, 9]

Code:

import pandas as pd
ds1 = pd.Series([2, 4, 6, 8, 10])
ds2 = pd.Series([1, 3, 5, 7, 9])
ds = ds1 + ds2
print("Add two Series:")
print(ds)
print("Subtract two Series:")
ds = ds1 - ds2
print(ds)
print("Multiply two Series:")
ds = ds1 * ds2
print(ds)
print("Divide Series1 by Series2:")
ds = ds1 / ds2
print(ds)
Sample Output:
Add two Series:
0 3
1 7
2 11
3 15
4 19
dtype: int64
Subtract two Series:
0 1
1 1
2 1
3 1
4 1
dtype: int64
Multiply two Series:
0 2
1 12
2 30
3 56
4 90
dtype: int64
Divide Series1 by Series2:
0 2.000000
1 1.333333
2 1.200000
3 1.142857
4 1.111111
dtype: float64

4. Write a Pandas program to convert a NumPy array to a Pandas series.


Sample NumPy array: d1 = [10, 20, 30, 40, 50]

Code:

import numpy as np
import pandas as pd
np_array = np.array([10, 20, 30, 40, 50])
print("NumPy array:")
print(np_array)
new_series = pd.Series(np_array)
print("Converted Pandas series:")
print(new_series)
Sample Output:
NumPy array:
[10 20 30 40 50]
Converted Pandas series:
0 10
1 20
2 30
3 40
4 50
dtype: int64

--------->>>>>Pandas Data Frames:

Consider Sample python dictionary Data and its labels:

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

1. Write a Pandas program to create and display a DataFrame from a specified dictionary data which has
the index labels.
Code:

import pandas as pd

import numpy as np

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)

print(df)

Sample Output:

attempts name qualify score

a 1 Anastasia yes 12.5

b 3 Dima no 9.0

c 2 Katherine yes 16.5

d 3 James no NaN

e 2 Emily no 9.0

f 3 Michael yes 20.0

g 1 Matthew yes 14.5

h 1 Laura no NaN

i 2 Kevin no 8.0

j 1 Jonas yes 19.0


2. Write a Pandas program to change the name 'James' to 'Suresh' in name column of the data frame.

Code:

import pandas as pd

import numpy as np

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)

print("Original rows:")

print(df)

print("\nChange the name 'James' to ‘Suresh’:")

df['name'] = df['name'].replace('James', 'Suresh')

print(df)

Sample Output:

Original rows:

attempts name qualify score

a 1 Anastasia yes 12.5

b 3 Dima no 9.0

c 2 Katherine yes 16.5

d 3 James no NaN

e 2 Emily no 9.0

f 3 Michael yes 20.0

g 1 Matthew yes 14.5


h 1 Laura no NaN

i 2 Kevin no 8.0

j 1 Jonas yes 19.0

Change the name 'James' to \‘Suresh\’:

attempts name qualify score

a 1 Anastasia yes 12.5

b 3 Dima no 9.0

c 2 Katherine yes 16.5

d 3 Suresh no NaN

e 2 Emily no 9.0

f 3 Michael yes 20.0

g 1 Matthew yes 14.5

h 1 Laura no NaN

i 2 Kevin no 8.0

j 1 Jonas yes 19.0

3.Write a Pandas program to insert a new column in existing DataFrame.

Code:

import pandas as pd

import numpy as np

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)
print("Original rows:")

print(df)

color = ['Red','Blue','Orange','Red','White','White','Blue','Green','Green','Red']

df['color'] = color

print("\nNew DataFrame after inserting the 'color' column")

print(df)

Sample Output:

Original rows:

attempts name qualify score

a 1 Anastasia yes 12.5

b 3 Dima no 9.0

c 2 Katherine yes 16.5

d 3 James no NaN

e 2 Emily no 9.0

f 3 Michael yes 20.0

g 1 Matthew yes 14.5

h 1 Laura no NaN

i 2 Kevin no 8.0

j 1 Jonas yes 19.0

New DataFrame after inserting the 'color' column

attempts name qualify score color

a 1 Anastasia yes 12.5 Red

b 3 Dima no 9.0 Blue

c 2 Katherine yes 16.5 Orange

d 3 James no NaN Red

e 2 Emily no 9.0 White


f 3 Michael yes 20.0 White

g 1 Matthew yes 14.5 Blue

h 1 Laura no NaN Green

i 2 Kevin no 8.0 Green

j 1 Jonas yes 19.0 Red

4. Write a Pandas program to get list from DataFrame column headers.

Code:

import pandas as pd

import numpy as np

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)

print(list(df.columns.values))

Sample Output:

['attempts', 'name', 'qualify', 'score']

--------->>>>>Pandas index:

1. Write a Pandas program to display the default index and set a column as an Index in a given
dataframe.
Test Data:

0 s001 V Alberto Franco 15/05/2002 35 street1 t1

1 s002 V Gino Mcneill 17/05/2002 32 street2 t2

2 s003 VI Ryan Parkes 16/02/1999 33 street3 t3

3 s001 VI Eesha Hinton 25/09/1998 30 street1 t4

4 s002 V Gino Mcneill 11/05/2002 31 street2 t5

5 s004 VI David Parkes 15/09/1997 32 street4 t6

Code:

import pandas as pd

df = pd.DataFrame({

'school_code': ['s001','s002','s003','s001','s002','s004'],

'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],

'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'],

'date_Of_Birth': ['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],

'weight': [35, 32, 33, 30, 31, 32],

'address': ['street1', 'street2', 'street3', 'street1', 'street2', 'street4'],

't_id':['t1', 't2', 't3', 't4', 't5', 't6']})

print("Default Index:")

print(df.head(10))

print("\nschool_code as new Index:")

df1 = df.set_index('school_code')

print(df1)

print("\nt_id as new Index:")

df2 = df.set_index('t_id')

print(df2)

Sample Output:
Default Index:

school_code class name date_Of_Birth weight address t_id

0 s001 V Alberto Franco 15/05/2002 35 street1 t1

1 s002 V Gino Mcneill 17/05/2002 32 street2 t2

2 s003 VI Ryan Parkes 16/02/1999 33 street3 t3

3 s001 VI Eesha Hinton 25/09/1998 30 street1 t4

4 s002 V Gino Mcneill 11/05/2002 31 street2 t5

5 s004 VI David Parkes 15/09/1997 32 street4 t6

t_id as new Index:

school_code class name date_Of_Birth weight address

t_id

t1 s001 V Alberto Franco 15/05/2002 35 street1

t2 s002 V Gino Mcneill 17/05/2002 32 street2

t3 s003 VI Ryan Parkes 16/02/1999 33 street3

t4 s001 VI Eesha Hinton 25/09/1998 30 street1

t5 s002 V Gino Mcneill 11/05/2002 31 street2

t6 s004 VI David Parkes 15/09/1997 32 street4

Reset the index:

t_id school_code class name date_Of_Birth weight address

0 t1 s001 V Alberto Franco 15/05/2002 35 street1

1 t2 s002 V Gino Mcneill 17/05/2002 32 street2

2 t3 s003 VI Ryan Parkes 16/02/1999 33 street3

3 t4 s001 VI Eesha Hinton 25/09/1998 30 street1

4 t5 s002 V Gino Mcneill 11/05/2002 31 street2

5 t6 s004 VI David Parkes 15/09/1997 32 street4


2. Write a Pandas program to create an index labels by using 64-bit integers, using floating-point
numbers in a given dataframe.

Test Data:

0 s001 V Alberto Franco 15/05/2002 35 street1 t1

1 s002 V Gino Mcneill 17/05/2002 32 street2 t2

2 s003 VI Ryan Parkes 16/02/1999 33 street3 t3

3 s001 VI Eesha Hinton 25/09/1998 30 street1 t4

4 s002 V Gino Mcneill 11/05/2002 31 street2 t5

5 s004 VI David Parkes 15/09/1997 32 street4 t6

Code:

import pandas as pd

print("Create an Int64Index:")

df_i64 = pd.DataFrame({

'school_code': ['s001','s002','s003','s001','s002','s004'],

'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],

'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'],

'date_Of_Birth': ['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],

'weight': [35, 32, 33, 30, 31, 32],

'address': ['street1', 'street2', 'street3', 'street1', 'street2', 'street4']},

index=[1, 2, 3, 4, 5, 6])

print(df_i64)

print("\nView the Index:")

print(df_i64.index)

print("\nFloating-point labels using Float64Index:")

df_f64 = pd.DataFrame({

'school_code': ['s001','s002','s003','s001','s002','s004'],
'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],

'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'],

'date_Of_Birth ': ['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],

'weight': [35, 32, 33, 30, 31, 32],

'address': ['street1', 'street2', 'street3', 'street1', 'street2', 'street4']},

index=[.1, .2, .3, .4, .5, .6])

print(df_f64)

print("\nView the Index:")

print(df_f64.index)

Sample Output:

Create an Int64Index:

school_code class name date_Of_Birth weight address

1 s001 V Alberto Franco 15/05/2002 35 street1

2 s002 V Gino Mcneill 17/05/2002 32 street2

3 s003 VI Ryan Parkes 16/02/1999 33 street3

4 s001 VI Eesha Hinton 25/09/1998 30 street1

5 s002 V Gino Mcneill 11/05/2002 31 street2

6 s004 VI David Parkes 15/09/1997 32 street4

View the Index:

Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')

Floating-point labels using Float64Index:

school_code class name date_Of_Birth weight address

0.1 s001 V Alberto Franco 15/05/2002 35 street1

0.2 s002 V Gino Mcneill 17/05/2002 32 street2

0.3 s003 VI Ryan Parkes 16/02/1999 33 street3


0.4 s001 VI Eesha Hinton 25/09/1998 30 street1

0.5 s002 V Gino Mcneill 11/05/2002 31 street2

0.6 s004 VI David Parkes 15/09/1997 32 street4

View the Index:

Float64Index([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], dtype='float64')

--------->>>>>Pandas: String and Regular Expression

1. Write a Pandas program to convert all the string values to upper, lower cases in a given pandas series.
Also find the length of the string values.

Code:

import pandas as pd

import numpy as np

s = pd.Series(['X', 'Y', 'Z', 'Aaba', 'Baca', np.nan, 'CABA', None, 'bird', 'horse', 'dog'])

print("Original series:")

print(s)

print("\nConvert all string values of the said Series to upper case:")

print(s.str.upper())

print("\nConvert all string values of the said Series to lower case:")

print(s.str.lower())

print("\nLength of the string values of the said Series:")

print(s.str.len())

Sample Output:

Original series:
0 X

1 Y

2 Z

3 Aaba

4 Baca

5 NaN

6 CABA

7 None

8 bird

9 horse

10 dog

dtype: object

Convert all string values of the said Series to upper case:

0 X

1 Y

2 Z

3 AABA

4 BACA

5 NaN

6 CABA

7 None

8 BIRD

9 HORSE

10 DOG

dtype: object

Convert all string values of the said Series to lower case:

0 x
1 y

2 z

3 aaba

4 baca

5 NaN

6 caba

7 None

8 bird

9 horse

10 dog

dtype: object

Length of the string values of the said Series:

0 1.0

1 1.0

2 1.0

3 4.0

4 4.0

5 NaN

6 4.0

7 NaN

8 4.0

9 5.0

10 3.0

dtype: float64

2. Write a Pandas program to remove whitespaces, left sided whitespaces and right sided white spaces
of the string values of a given panda series.
Code:

import pandas as pd

color1 = pd.Index([' Green', 'Black ', ' Red ', 'White', ' Pink '])

print("Original series:")

print(color1)

print("\nRemove whitespace")

print(color1.str.strip())

print("\nRemove left sided whitespace")

print(color1.str.lstrip())

print("\nRemove Right sided whitespace")

print(color1.str.rstrip())

Sample Output:

Original series:

Index([' Green', 'Black ', ' Red ', 'White', ' Pink '], dtype='object')

Remove whitespace

Index(['Green', 'Black', 'Red', 'White', 'Pink'], dtype='object')

Remove left sided whitespace

Index(['Green', 'Black ', 'Red ', 'White', 'Pink '], dtype='object')

Remove Right sided whitespace

Index([' Green', 'Black', ' Red', 'White', ' Pink'], dtype='object')

3. Write a Pandas program to count of occurrence of a specified substring in a DataFrame column.


Code:

import pandas as pd

df = pd.DataFrame({

'name_code': ['c001','c002','c022', 'c2002', 'c2222'],

'date_of_birth ': ['12/05/2002','16/02/1999','25/09/1998','12/02/2022','15/09/1997'],

'age': [18.5, 21.2, 22.5, 22, 23]

})

print("Original DataFrame:")

print(df)

print("\nCount occurrence of 2 in date_of_birth column:")

df['count'] = list(map(lambda x: x.count("2"), df['name_code']))

print(df)

Sample Output:

Original DataFrame:

name_code date_of_birth age

0 c001 12/05/2002 18.5

1 c002 16/02/1999 21.2

2 c022 25/09/1998 22.5

3 c2002 12/02/2022 22.0

4 c2222 15/09/1997 23.0

Count occurrence of 2 in date_of_birth column:

name_code date_of_birth age count

0 c001 12/05/2002 18.5 0

1 c002 16/02/1999 21.2 1

2 c022 25/09/1998 22.5 2


3 c2002 12/02/2022 22.0 2

4 c2222 15/09/1997 23.0 4

4. Write a Pandas program to swap the cases of a specified character column in a given DataFrame.

Code:

import pandas as pd

df = pd.DataFrame({

'company_code': ['Abcd','EFGF', 'zefsalf', 'sdfslew', 'zekfsdf'],

'date_of_sale': ['12/05/2002','16/02/1999','25/09/1998','12/02/2022','15/09/1997'],

'sale_amount': [12348.5, 233331.2, 22.5, 2566552.0, 23.0]

})

print("Original DataFrame:")

print(df)

print("\nSwapp cases in comapny_code:")

df['swapped_company_code'] = list(map(lambda x: x.swapcase(), df['company_code']))

print(df)

Sample Output:

Original DataFrame:

company_code date_of_sale sale_amount

0 Abcd 12/05/2002 12348.5

1 EFGF 16/02/1999 233331.2

2 zefsalf 25/09/1998 22.5

3 sdfslew 12/02/2022 2566552.0

4 zekfsdf 15/09/1997 23.0


Swapp cases in comapny_code:

company_code ... swapped_company_code

0 Abcd ... aBCD

1 EFGF ... efgf

2 zefsalf .. ZEFSALF

3 sdfslew ... SDFSLEW

4 zekfsdf ... ZEKFSDF

[5 rows x 4 columns]

----------->Pandas Joining and merging DataFrame

1. Write a Pandas program to join the two given dataframes along rows and assign all data.

Test Data:

student_data1:

student_id name marks

0 S1 Danniella Fenton 200

1 S2 Ryder Storey 210

2 S3 Bryce Jensen 190

3 S4 Ed Bernal 222

4 S5 Kwame Morin 199

student_data2:

student_id name marks

0 S4 Scarlette Fisher 201

1 S5 Carla Williamson 200

2 S6 Dante Morse 198

3 S7 Kaiser William 219


4 S8 Madeeha Preston 201

Code:

import pandas as pd

student_data1 = pd.DataFrame({

'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],

'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed Bernal', 'Kwame Morin'],

'marks': [200, 210, 190, 222, 199]})

student_data2 = pd.DataFrame({

'student_id': ['S4', 'S5', 'S6', 'S7', 'S8'],

'name': ['Scarlette Fisher', 'Carla Williamson', 'Dante Morse', 'Kaiser William', 'Madeeha Preston'],

'marks': [201, 200, 198, 219, 201]})

print("Original DataFrames:")

print(student_data1)

print("-------------------------------------")

print(student_data2)

print("\nJoin the said two dataframes along rows:")

result_data = pd.concat([student_data1, student_data2])

print(result_data)

Sample Output:

Original DataFrames:

student_id name marks

0 S1 Danniella Fenton 200


1 S2 Ryder Storey 210

2 S3 Bryce Jensen 190

3 S4 Ed Bernal 222

4 S5 Kwame Morin 199

-------------------------------------

student_id name marks

0 S4 Scarlette Fisher 201

1 S5 Carla Williamson 200

2 S6 Dante Morse 198

3 S7 Kaiser William 219

4 S8 Madeeha Preston 201

Join the said two dataframes along rows:

student_id name marks

0 S1 Danniella Fenton 200

1 S2 Ryder Storey 210

2 S3 Bryce Jensen 190

3 S4 Ed Bernal 222

4 S5 Kwame Morin 199

0 S4 Scarlette Fisher 201

1 S5 Carla Williamson 200

2 S6 Dante Morse 198

3 S7 Kaiser William 219

4 S8 Madeeha Preston 201

2. Write a Pandas program to append a list of dictioneries or series to a existing DataFrame and display
the combined data

Test Data:
student_id name marks

0 S1 Danniella Fenton 200

1 S2 Ryder Storey 210

2 S3 Bryce Jensen 190

3 S4 Ed Bernal 222

4 S5 Kwame Morin 199

Dictionary:

student_id S6

name Scarlette Fisher

marks 205

dtype: object

Code:

import pandas as pd

student_data1 = pd.DataFrame({

'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],

'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed Bernal', 'Kwame Morin'],

'marks': [200, 210, 190, 222, 199]})

s6 = pd.Series(['S6', 'Scarlette Fisher', 205], index=['student_id', 'name', 'marks'])

dicts = [{'student_id': 'S6', 'name': 'Scarlette Fisher', 'marks': 203},

{'student_id': 'S7', 'name': 'Bryce Jensen', 'marks': 207}]

print("Original DataFrames:")

print(student_data1)
print("\nDictionary:")

print(s6)

combined_data = student_data1.append(dicts, ignore_index=True, sort=False)

print("\nCombined Data:")

print(combined_data)

Sample Output:

Original DataFrames:

student_id name marks

0 S1 Danniella Fenton 200

1 S2 Ryder Storey 210

2 S3 Bryce Jensen 190

3 S4 Ed Bernal 222

4 S5 Kwame Morin 199

Dictionary:

student_id S6

name Scarlette Fisher

marks 205

dtype: object

Combined Data:

student_id name marks

0 S1 Danniella Fenton 200

1 S2 Ryder Storey 210

2 S3 Bryce Jensen 190

3 S4 Ed Bernal 222

4 S5 Kwame Morin 199


5 S6 Scarlette Fisher 203

6 S7 Bryce Jensen 207

3. Write a Pandas program to join the two dataframes with matching records from both sides where
available.

Test Data:

student_data1:

student_id name marks

0 S1 Danniella Fenton 200

1 S2 Ryder Storey 210

2 S3 Bryce Jensen 190

3 S4 Ed Bernal 222

4 S5 Kwame Morin 199

student_data2:

student_id name marks

0 S4 Scarlette Fisher 201

1 S5 Carla Williamson 200

2 S6 Dante Morse 198

3 S7 Kaiser William 219

4 S8 Madeeha Preston 201

Code:

import pandas as pd

student_data1 = pd.DataFrame({

'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],

'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed Bernal', 'Kwame Morin'],
'marks': [200, 210, 190, 222, 199]})

student_data2 = pd.DataFrame({

'student_id': ['S4', 'S5', 'S6', 'S7', 'S8'],

'name': ['Scarlette Fisher', 'Carla Williamson', 'Dante Morse', 'Kaiser William', 'Madeeha Preston'],

'marks': [201, 200, 198, 219, 201]})

print("Original DataFrames:")

print(student_data1)

print(student_data2)

merged_data = pd.merge(student_data1, student_data2, on='student_id', how='outer')

print("Merged data (outer join):")

print(merged_data)

Sample Output:

Original DataFrames:

student_id name marks

0 S1 Danniella Fenton 200

1 S2 Ryder Storey 210

2 S3 Bryce Jensen 190

3 S4 Ed Bernal 222

4 S5 Kwame Morin 199

student_id name marks

0 S4 Scarlette Fisher 201

1 S5 Carla Williamson 200

2 S6 Dante Morse 198

3 S7 Kaiser William 219

4 S8 Madeeha Preston 201


Merged data (outer join):

student_id name_x marks_x name_y marks_y

0 S1 Danniella Fenton 200.0 NaN NaN

1 S2 Ryder Storey 210.0 NaN NaN

2 S3 Bryce Jensen 190.0 NaN NaN

3 S4 Ed Bernal 222.0 Scarlette Fisher 201.0

4 S5 Kwame Morin 199.0 Carla Williamson 200.0

5 S6 NaN NaN Dante Morse 198.0

6 S7 NaN NaN Kaiser William 219.0

7 S8 NaN NaN Madeeha Preston 201.0

--------------->>>>Pandas Time Series

1. Write a Pandas program to create

a) Datetime object for Jan 15 2012.

b) Specific date and time of 9:20 pm.

c) Local date and time.

d) A date without time.

e) Current date.

f) Time from a datetime.

g) Current local time.

Code:

import datetime

from datetime import datetime

print("Datetime object for Jan 15 2012:")

print(datetime(2012, 1, 15))

print("\nSpecific date and time of 9:20 pm")


print(datetime(2011, 1, 15, 21, 20))

print("\nLocal date and time:")

print(datetime.now())

print("\nA date without time: ")

print(datetime.date(datetime(2012, 5, 22)))

print("\nCurrent date:")

print(datetime.now().date())

print("\nTime from a datetime:")

print(datetime.time(datetime(2012, 12, 15, 18, 12)))

print("\nCurrent local time:")

print(datetime.now().time())

Sample Output:

Datetime object for Jan 15 2012:

2012-01-15 00:00:00

Specific date and time of 9:20 pm

2011-01-15 21:20:00

Local date and time:

2020-08-17 09:56:17.459790

A date without time:

2012-05-22

Current date:

2020-08-17
Time from a datetime:

18:12:00

Current local time:

09:56:17.461250

2. Write a Pandas program to create a date from a given year, month, day and another date from a given
string formats.

Code:

from datetime import datetime

date1 = datetime(year=2020, month=12, day=25)

print("Date from a given year, month, day:")

print(date1)

from dateutil import parser

date2 = parser.parse("1st of January, 2021")

print("\nDate from a given string formats:")

print(date2)

Sample Output:

Date from a given year, month, day:

2020-12-25 00:00:00

Date from a given string formats:

2021-01-01 00:00:00
3. Write a Pandas program to create a time-series with two index labels and random values. Also print
the type of the index.

Code:

import pandas as pd

import numpy as np

import datetime

from datetime import datetime, date

dates = [datetime(2011, 9, 1), datetime(2011, 9, 2)]

print("Time-series with two index labels:")

time_series = pd.Series(np.random.randn(2), dates)

print(time_series)

print("\nType of the index:")

print(type(time_series.index))

Sample Output:

Time-series with two index labels:

2011-09-01 -0.257567

2011-09-02 0.947341

dtype: float64

Type of the index:

<class 'pandas.core.indexes.datetimes.DatetimeIndex'>

---------------->>>>>>Pandas Grouping and Aggregating

Consider dataset:
school class name date_Of_Birth age height weight address

S1 s001 V Alberto Franco 15/05/2002 12 173 35 street1

S2 s002 V Gino Mcneill 17/05/2002 12 192 32 street2

S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 street3

S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 street1

S5 s002 V Gino Mcneill 11/05/2002 14 151 31 street2

S6 s004 VI David Parkes 15/09/1997 12 159 32 street4

1. Write a Pandas program to split the following dataframe into groups based on school code. Also check
the type of GroupBy object.

Code:

import pandas as pd

pd.set_option('display.max_rows', None)

#pd.set_option('display.max_columns', None)

student_data = pd.DataFrame({

'school_code': ['s001','s002','s003','s001','s002','s004'],

'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],

'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'],

'date_Of_Birth ': ['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],

'age': [12, 12, 13, 13, 14, 12],

'height': [173, 192, 186, 167, 151, 159],

'weight': [35, 32, 33, 30, 31, 32],

'address': ['street1', 'street2', 'street3', 'street1', 'street2', 'street4']},

index=['S1', 'S2', 'S3', 'S4', 'S5', 'S6'])

print("Original DataFrame:")

print(student_data)
print('\nSplit the said data on school_code wise:')

result = student_data.groupby(['school_code'])

for name,group in result:

print("\nGroup:")

print(name)

print(group)

print("\nType of the object:")

print(type(result))

Sample Output:

Original DataFrame:

school_code class name ... height weight address

S1 s001 V Alberto Franco ... 173 35 street1

S2 s002 V Gino Mcneill ... 192 32 street2

S3 s003 VI Ryan Parkes ... 186 33 street3

S4 s001 VI Eesha Hinton ... 167 30 street1

S5 s002 V Gino Mcneill ... 151 31 street2

S6 s004 VI David Parkes ... 159 32 street4

[6 rows x 8 columns]

Split the said data on school_code wise:

Group:

s001

school_code class name ... height weight address

S1 s001 V Alberto Franco ... 173 35 street1

S4 s001 VI Eesha Hinton ... 167 30 street1


[2 rows x 8 columns]

Group:

s002

school_code class name ... height weight address

S2 s002 V Gino Mcneill ... 192 32 street2

S5 s002 V Gino Mcneill ... 151 31 street2

[2 rows x 8 columns]

Group:

s003

school_code class name ... height weight address

S3 s003 VI Ryan Parkes ... 186 33 street3

[1 rows x 8 columns]

Group:

s004

school_code class name ... height weight address

S6 s004 VI David Parkes ... 159 32 street4

[1 rows x 8 columns]

Type of the object:

<class 'pandas.core.groupby.groupby.DataFrameGroupBy'>
2. Write a Pandas program to split the following dataframe by school code and get mean, min, and max
value of age for each school.

Code:

import pandas as pd

pd.set_option('display.max_rows', None)

#pd.set_option('display.max_columns', None)

student_data = pd.DataFrame({

'school_code': ['s001','s002','s003','s001','s002','s004'],

'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],

'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'],

'date_Of_Birth ': ['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],

'age': [12, 12, 13, 13, 14, 12],

'height': [173, 192, 186, 167, 151, 159],

'weight': [35, 32, 33, 30, 31, 32],

'address': ['street1', 'street2', 'street3', 'street1', 'street2', 'street4']},

index=['S1', 'S2', 'S3', 'S4', 'S5', 'S6'])

print("Original DataFrame:")

print(student_data)

print('\nMean, min, and max value of age for each value of the school:')

grouped_single = student_data.groupby('school_code').agg({'age': ['mean', 'min', 'max']})

print(grouped_single)

Sample Output:

Original DataFrame:

school_code class name ... height weight address

S1 s001 V Alberto Franco ... 173 35 street1


S2 s002 V Gino Mcneill ... 192 32 street2

S3 s003 VI Ryan Parkes ... 186 33 street3

S4 s001 VI Eesha Hinton ... 167 30 street1

S5 s002 V Gino Mcneill ... 151 31 street2

S6 s004 VI David Parkes ... 159 32 street4

[6 rows x 8 columns]

Mean, min, and max value of age for each value of the school:

age

mean min max

school_code

s001 12.5 12 13

s002 13.0 12 14

s003 13.0 13 13

s004 12.0 12 12

----------->>>>>Pandas styling

1. Create a dataframe of ten rows, four columns with random values. Write a Pandas program to
highlight the negative numbers red and positive numbers black

Code:

import pandas as pd

import numpy as np

np.random.seed(24)

df = pd.DataFrame({'A': np.linspace(1, 10, 10)})

df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],


axis=1)

print("Original array:")

print(df)

def color_negative_red(val):

color = 'red' if val < 0 else 'black'

return 'color: %s' % color

print("\nNegative numbers red and positive numbers black:")

df.style.applymap(color_negative_red)

Original array:

A B C D E

0 1.0 1.329212 -0.770033 -0.316280 -0.990810

1 2.0 -1.070816 -1.438713 0.564417 0.295722

2 3.0 -1.626404 0.219565 0.678805 1.889273

3 4.0 0.961538 0.104011 -0.481165 0.850229

4 5.0 1.453425 1.057737 0.165562 0.515018

5 6.0 -1.336936 0.562861 1.392855 -0.063328

6 7.0 0.121668 1.207603 -0.002040 1.627796

7 8.0 0.354493 1.037528 -0.385684 0.519818

8 9.0 1.686583 -1.325963 1.428984 -2.089354

9 10.0 -0.129820 0.631523 -0.586538 0.290720

Negative numbers red and positive numbers black:

Sample Output:
2. Create a dataframe of ten rows, four columns with random values. Write a
Pandas program to highlight the maximum value in each column.
Code:

import pandas as pd
import numpy as np
np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4),
columns=list('BCDE'))],
axis=1)
df.iloc[0, 2] = np.nan
df.iloc[3, 3] = np.nan
df.iloc[4, 1] = np.nan
df.iloc[9, 4] = np.nan
print("Original array:")
print(df)
def highlight_max(s):
'''
highlight the maximum in a Series green.
'''
is_max = s == s.max()
return ['background-color: green' if v else '' for v in is_max]

print("\nHighlight the maximum value in each column:")


df.style.apply(highlight_max,subset=pd.IndexSlice[:, ['B', 'C', 'D',
'E']])
Original array:
A B C D E

0 1.0 1.329212 -0.770033 -0.316280 -0.990810

1 2.0 -1.070816 -1.438713 0.564417 0.295722

2 3.0 -1.626404 0.219565 0.678805 1.889273

3 4.0 0.961538 0.104011 -0.481165 0.850229

4 5.0 1.453425 1.057737 0.165562 0.515018

5 6.0 -1.336936 0.562861 1.392855 -0.063328

6 7.0 0.121668 1.207603 -0.002040 1.627796

7 8.0 0.354493 1.037528 -0.385684 0.519818

8 9.0 1.686583 -1.325963 1.428984 -2.089354

9 10.0 -0.129820 0.631523 -0.586538 0.290720


Highlight the maximum value in each column:

Sample Output:

3. Create a dataframe of ten rows, four columns with random values. Write a
Pandas program to highlight dataframe's specific columns.

Code:

import pandas as pd
import numpy as np
np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4),
columns=list('BCDE'))],
axis=1)
df.iloc[0, 2] = np.nan
df.iloc[3, 3] = np.nan
df.iloc[4, 1] = np.nan
df.iloc[9, 4] = np.nan
print("Original array:")
print(df)
def highlight_cols(s):
color = 'grey'
return 'background-color: %s' % color
print("\nHighlight specific columns:")
df.style.applymap(highlight_cols, subset=pd.IndexSlice[:, ['B', 'C']])
Original array:

Original array:

A B C D E

0 1.0 1.329212 -0.770033 -0.316280 -0.990810

1 2.0 -1.070816 -1.438713 0.564417 0.295722

2 3.0 -1.626404 0.219565 0.678805 1.889273

3 4.0 0.961538 0.104011 -0.481165 0.850229

4 5.0 1.453425 1.057737 0.165562 0.515018

5 6.0 -1.336936 0.562861 1.392855 -0.063328

6 7.0 0.121668 1.207603 -0.002040 1.627796

7 8.0 0.354493 1.037528 -0.385684 0.519818

8 9.0 1.686583 -1.325963 1.428984 -2.089354

9 10.0 -0.129820 0.631523 -0.586538 0.290720

Highlight specific columns:


Sample Output:
----------->>>>>Pandas Excel

1. Write a Pandas program to import given excel data (coalpublic2013.xlsx )


into a Pandas dataframe. 

Excel Data:

coalpublic2013.xlsx:

Year MSHA ID Mine_Name Production Labor_Hours


2013 103381 Highwall Miner 56,004 22,392
2013 103404 Reid School Mine 28,807 8,447
2013 100759 Underground Min 14,40,115 4,74,784
2013 103246 Bear Creek 87,587 29,193

Code:

import pandas as pd
import numpy as np
df = pd.read_excel('E:\coalpublic2013.xlsx')
print(df.head)
Sample Output:
Year MSHA ID Mine_Name Production Labor_Hours
0 2013 103381 Highwall Miner 56004 22392
1 2013 103404 Reid School Mine 28807 28447
2 2013 100759 Underground Min 1440115 474784
3 2013 103246 Bear Creek 87587 29193

2. Write a Pandas program to find the sum, mean, max, min value of
'Production (short tons)' column of coalpublic2013.xlsx file

Excel Data:

coalpublic2013.xlsx:

Year MSHA ID Mine_Name Production Labor_Hours


2013 103381 Highwall Miner 56,004 22,392
2013 103404 Reid School Mine 28,807 8,447
2013 100759 Underground Min 14,40,115 4,74,784
2013 103246 Bear Creek 87,587 29,193

Code:

import pandas as pd
import numpy as np
df = pd.read_excel('E:\coalpublic2013.xlsx')
print("Sum: ",df["Production"].sum())
print("Mean: ",df["Production"].mean())
print("Maximum: ",df["Production"].max())
print("Minimum: ",df["Production"].min())
Sample Output:
Sum: 1611713
Mean: 402928.25
Maximum: 14,40,115
Minimum: 28,807

You might also like