0% found this document useful (0 votes)
3 views

Project_Prog

The document provides various examples of using the Pandas library in Python for data manipulation and visualization. It includes programs to count rows and columns in a DataFrame, select data based on conditions, handle missing values, import/export CSV files, and create different types of charts. Each example is accompanied by code snippets and expected outputs.

Uploaded by

tapaskumarmahato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Project_Prog

The document provides various examples of using the Pandas library in Python for data manipulation and visualization. It includes programs to count rows and columns in a DataFrame, select data based on conditions, handle missing values, import/export CSV files, and create different types of charts. Each example is accompanied by code snippets and expected outputs.

Uploaded by

tapaskumarmahato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

write a pandas program to count the number of rows and columns of a dataframe with practical example

Name Score Age Qualify_label


Amit 98 20 yes
Kamal 80 25 yes
Ram 60 22 No
Riya 85 24 Yes
Anup 49 21 No
Suman 92 20 Yes

Ans

: # importing pandas
import pandas as pd

result_data = {'name': ['Katherine', 'James', 'Emily',

'Michael', 'Matthew', 'Laura'],

'score': [98, 80, 60, 85, 49, 92],

'age': [20, 25, 22, 24, 21, 20],

'qualify_label': ['yes', 'yes', 'no',

'yes', 'no', 'yes']}


# creating dataframe

df = pd.DataFrame(result_data, index=None)

# computing number of rows

rows = len(df.axes[0])

# computing number of columns

cols = len(df.axes[1])

print("Number of Rows: ", rows)

print("Number of Columns: ", cols)

Output:

Number of Rows: 6

Number of Columns: 4
Write a Pandas program to select the name of persons whose height is between 5 to 5.5 (both values
inclusive)

'name': ['Asha', 'Radha', 'Kamal', 'Divy', 'Anjali'],


'height': [ 5.5, 5, np.nan, 5.9, np.nan],
'age': [11, 23, 22, 33, 22]
Solution: import pandas as pd
import numpy as np
pers_data = {'name': ['Asha', 'Radha', 'Kamal', 'Divy', 'Anjali'], 'height': [ 5.5, 5, np.nan, 5.9,
np.nan], 'age': [11, 23, 22, 33, 22]}
labels = ['a', 'b', 'c', 'd', 'e']
df = pd.DataFrame(pers_data , index=labels)
print("Persons whose height is between 5 and 5.5")
print(df[(df['height']>= 5 )& (df['height']<= 5.5)])

Write a Pandas program to select the rows the score is between 15 and 20 (inclusive)
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)
print("Rows where score between 15 and 20 (inclusive):")
print(df[df['score'].between(15, 20)])

output: Rows where score between 15 and 20 (inclusive):


attempts name qualify score
c 2 Katherine yes 16.5
f 3 Michael yes 20.0
j 1 Jonas yes 19.0
Write a Pandas program to find and replace the missing values in a given
DataFrame which do not have any valuable information.

Example:
Missing values: ?, --
Replace those values with NaN

Test Data:
ord_no purch_amt ord_date customer_id salesman_id
0 70001 150.5 ? 3002 5002
1 NaN 270.65 2012-09-10 3001 5003
2 70002 65.26 NaN 3001 ?
3 70004 110.5 2012-08-17 3003 5001
4 NaN 948.5 2012-09-10 3002 NaN
5 70005 2400.6 2012-07-27 3001 5002
6 -- 5760 2012-09-10 3001 5001
7 70010 ? 2012-10-10 3004 ?
8 70003 12.43 2012-10-10 -- 5003
9 70012 2480.4 2012-06-27 3002 5002
10 NaN 250.45 2012-08-17 3001 5003
11 70013 3045.6 2012-04-25 3001 --
Sample Solution:

Python Code :
import pandas as pd

import numpy as np

pd.set_option('display.max_rows', None)

#pd.set_option('display.max_columns', None)

df = pd.DataFrame({

'ord_no':
[70001,np.nan,70002,70004,np.nan,70005,"--",70010,70003,70012,np.na
n,70013],

'purch_amt':
[150.5,270.65,65.26,110.5,948.5,2400.6,5760,"?",12.43,2480.4,250.45
, 3045.6],

'ord_date': ['?','2012-09-10',np.nan,'2012-08-17','2012-09-
10','2012-07-27','2012-09-10','2012-10-10','2012-10-10','2012-06-
27','2012-08-17','2012-04-25'],

'customer_id':
[3002,3001,3001,3003,3002,3001,3001,3004,"--",3002,3001,3001],

'salesman_id':
[5002,5003,"?",5001,np.nan,5002,5001,"?",5003,5002,5003,"--"]})

print("Original Orders DataFrame:")


print(df)

print("\nReplace the missing values with NaN:")

result = df.replace({"?": np.nan, "--": np.nan})

print(result)

Copy
Sample Output:
Original Orders DataFrame:
ord_no purch_amt ord_date customer_id salesman_id
0 70001 150.5 ? 3002 5002
1 NaN 270.65 2012-09-10 3001 5003
2 70002 65.26 NaN 3001 ?
3 70004 110.5 2012-08-17 3003 5001
4 NaN 948.5 2012-09-10 3002 NaN
5 70005 2400.6 2012-07-27 3001 5002
6 -- 5760 2012-09-10 3001 5001
7 70010 ? 2012-10-10 3004 ?
8 70003 12.43 2012-10-10 -- 5003
9 70012 2480.4 2012-06-27 3002 5002
10 NaN 250.45 2012-08-17 3001 5003
11 70013 3045.6 2012-04-25 3001 --

Replace the missing values with NaN:


ord_no purch_amt ord_date customer_id salesman_id
0 70001.0 150.50 NaN 3002.0 5002.0
1 NaN 270.65 2012-09-10 3001.0 5003.0
2 70002.0 65.26 NaN 3001.0 NaN
3 70004.0 110.50 2012-08-17 3003.0 5001.0
4 NaN 948.50 2012-09-10 3002.0 NaN
5 70005.0 2400.60 2012-07-27 3001.0 5002.0
6 NaN 5760.00 2012-09-10 3001.0 5001.0
7 70010.0 NaN 2012-10-10 3004.0 NaN
8 70003.0 12.43 2012-10-10 NaN 5003.0
9 70012.0 2480.40 2012-06-27 3002.0 5002.0
10 NaN 250.45 2012-08-17 3001.0 5003.0
11 70013.0 3045.60 2012-04-25 3001.0 NaN

write a program to import and export data between pandas and csv file

import pandas as pd
df=pd.read_csv("C:\\Users\\Desktop\\covid19.csv")

import pandas as pd
data = {'Name': ['Smith', 'Parker'], 'ID': [101, 102], 'Language': ['Python', 'JavaScript']}
info = pd.DataFrame(data)
print('DataFrame Values:\n', info)
# default CSV
csv_data = info.to_csv()
print('\nCSV String Values:\n', csv_data)
Given the school result data, analyses the performance of the students on different
parameters, e.g subject wise or class wise.
import pandas as pd
import matplotlib.pyplot as plt
# Simple Line Chart with setting of Label of X and Y axis,
# title for chart line and color of line
subject = ['Physic','Chemistry','Mathematics', 'Biology','Computer']
marks =[80,75,70,78,82]
# To draw line in red colour
plt.plot(subject,marks,'r',marker ='*')
# To Write Title of the Line Chart
plt.title('Marks Scored')
# To Put Label At Y Axis
plt.xlabel('SUBJECT')
# To Put Label At X Axis
plt.ylabel('MARKS')
plt.show()
Output:
Write a program to create bar chart of five most countries are effected by corona virus in 2020.Read the
data from CSV file.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
a=pd.read_csv("C:\\Download\\Covid.csv")
x=np.linspace(1,61,5)
plt.xticks(x+6/2,['China','Italy','India','Bangladesh,'USA'])
plt.bar(x,a['c'],width=3,color='blue',label='Cases')
plt.bar(x+3,a['r'],width=3,color='green',label='Recover')
plt.bar(x+6,a['d'],width=3,color='red',label='Death')
plt.title("Most affected countries due to covid19")
plt.legend()
plt.xlabel("Countries")
plt.ylabel("Number")
plt.show()

Draw the histogram based on the Production of Wheatin different Years


Year:2000,2002,2004,2006,2008,2010,2012,2014,2016,2018
Production':4,6,7,15,24,2,19,5,16,4
import pandas as pd
import matplotlib.pyplot as plt
data={'Year':[2000,2002,2004,2006,2008,2010,2012,2014,2016,2018],\ 'Production':
[4,6,7,15,24,2,19,5,16,4]}
d=pd.DataFrame(data)
print(d)
x=d.hist(column='Production',bins=5,grid=True)
plt.show(x)

The table shows passenger car fuel rates in miles per gallon for several years. Make a LINE GRAPH of the
data. During which 2-year period did the fuel rate decrease?
YEAR: 2000 2002 2004 2006
RATE: 21.0 20.7 21.2 21.6
import matplotlib.pyplot as p
Yr=[2000,2002,2004,2006]
rate=[21.0,20.7,21.2,21.6]
p.plot(Yr,rate)
p.show()

The number of bed-sheets manufactured by a factory during five consecutive weeks is given below.
Week First Second Third Fourth Fifth
Number of Bed-sheets 600 850 700 300 900
Draw the bar graph representing the above data

import matplotlib.pyplot as plt


x=['First','Second','Third','Fourth','Fifth']
y=[600,850,700,300,900]

p.title('Production By Factory')
p.xlabel('Week')
p.ylabel('No. of Bed Sheets')
p.bar(x,y,color='Blue',width=.50)
p.show()

You might also like