Project_Prog
Project_Prog
Ans
: # importing pandas
import pandas as pd
df = pd.DataFrame(result_data, index=None)
rows = len(df.axes[0])
cols = len(df.axes[1])
Output:
Number of Rows: 6
Number of Columns: 4
Write a Pandas program to select the name of persons whose height is between 5 to 5.5 (both values
inclusive)
Write a Pandas program to select the rows the score is between 15 and 20 (inclusive)
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)
print("Rows where score between 15 and 20 (inclusive):")
print(df[df['score'].between(15, 20)])
Example:
Missing values: ?, --
Replace those values with NaN
Test Data:
ord_no purch_amt ord_date customer_id salesman_id
0 70001 150.5 ? 3002 5002
1 NaN 270.65 2012-09-10 3001 5003
2 70002 65.26 NaN 3001 ?
3 70004 110.5 2012-08-17 3003 5001
4 NaN 948.5 2012-09-10 3002 NaN
5 70005 2400.6 2012-07-27 3001 5002
6 -- 5760 2012-09-10 3001 5001
7 70010 ? 2012-10-10 3004 ?
8 70003 12.43 2012-10-10 -- 5003
9 70012 2480.4 2012-06-27 3002 5002
10 NaN 250.45 2012-08-17 3001 5003
11 70013 3045.6 2012-04-25 3001 --
Sample Solution:
Python Code :
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
df = pd.DataFrame({
'ord_no':
[70001,np.nan,70002,70004,np.nan,70005,"--",70010,70003,70012,np.na
n,70013],
'purch_amt':
[150.5,270.65,65.26,110.5,948.5,2400.6,5760,"?",12.43,2480.4,250.45
, 3045.6],
'ord_date': ['?','2012-09-10',np.nan,'2012-08-17','2012-09-
10','2012-07-27','2012-09-10','2012-10-10','2012-10-10','2012-06-
27','2012-08-17','2012-04-25'],
'customer_id':
[3002,3001,3001,3003,3002,3001,3001,3004,"--",3002,3001,3001],
'salesman_id':
[5002,5003,"?",5001,np.nan,5002,5001,"?",5003,5002,5003,"--"]})
print(result)
Copy
Sample Output:
Original Orders DataFrame:
ord_no purch_amt ord_date customer_id salesman_id
0 70001 150.5 ? 3002 5002
1 NaN 270.65 2012-09-10 3001 5003
2 70002 65.26 NaN 3001 ?
3 70004 110.5 2012-08-17 3003 5001
4 NaN 948.5 2012-09-10 3002 NaN
5 70005 2400.6 2012-07-27 3001 5002
6 -- 5760 2012-09-10 3001 5001
7 70010 ? 2012-10-10 3004 ?
8 70003 12.43 2012-10-10 -- 5003
9 70012 2480.4 2012-06-27 3002 5002
10 NaN 250.45 2012-08-17 3001 5003
11 70013 3045.6 2012-04-25 3001 --
write a program to import and export data between pandas and csv file
import pandas as pd
df=pd.read_csv("C:\\Users\\Desktop\\covid19.csv")
import pandas as pd
data = {'Name': ['Smith', 'Parker'], 'ID': [101, 102], 'Language': ['Python', 'JavaScript']}
info = pd.DataFrame(data)
print('DataFrame Values:\n', info)
# default CSV
csv_data = info.to_csv()
print('\nCSV String Values:\n', csv_data)
Given the school result data, analyses the performance of the students on different
parameters, e.g subject wise or class wise.
import pandas as pd
import matplotlib.pyplot as plt
# Simple Line Chart with setting of Label of X and Y axis,
# title for chart line and color of line
subject = ['Physic','Chemistry','Mathematics', 'Biology','Computer']
marks =[80,75,70,78,82]
# To draw line in red colour
plt.plot(subject,marks,'r',marker ='*')
# To Write Title of the Line Chart
plt.title('Marks Scored')
# To Put Label At Y Axis
plt.xlabel('SUBJECT')
# To Put Label At X Axis
plt.ylabel('MARKS')
plt.show()
Output:
Write a program to create bar chart of five most countries are effected by corona virus in 2020.Read the
data from CSV file.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
a=pd.read_csv("C:\\Download\\Covid.csv")
x=np.linspace(1,61,5)
plt.xticks(x+6/2,['China','Italy','India','Bangladesh,'USA'])
plt.bar(x,a['c'],width=3,color='blue',label='Cases')
plt.bar(x+3,a['r'],width=3,color='green',label='Recover')
plt.bar(x+6,a['d'],width=3,color='red',label='Death')
plt.title("Most affected countries due to covid19")
plt.legend()
plt.xlabel("Countries")
plt.ylabel("Number")
plt.show()
The table shows passenger car fuel rates in miles per gallon for several years. Make a LINE GRAPH of the
data. During which 2-year period did the fuel rate decrease?
YEAR: 2000 2002 2004 2006
RATE: 21.0 20.7 21.2 21.6
import matplotlib.pyplot as p
Yr=[2000,2002,2004,2006]
rate=[21.0,20.7,21.2,21.6]
p.plot(Yr,rate)
p.show()
The number of bed-sheets manufactured by a factory during five consecutive weeks is given below.
Week First Second Third Fourth Fifth
Number of Bed-sheets 600 850 700 300 900
Draw the bar graph representing the above data
p.title('Production By Factory')
p.xlabel('Week')
p.ylabel('No. of Bed Sheets')
p.bar(x,y,color='Blue',width=.50)
p.show()