IP Practical File
IP Practical File
Object1 Population stores the details of population in four metro cities of India and Object
AvgIncome stores the total average income reported in previous year in each of these metros.
Calculate income perCapita for each of these metro cities.
import pandas as pd
Pop= pd.Series([10927986, 12691836, 4631392, 4328063 ],
index = ['Delhi', 'Mumbai', 'Kolkata', 'Chennai'])
AvgInc= pd.Series([36360927986, 252325355, 4141577878, 34896899], index = ['Delhi',
'Mumbai', 'Kolkata', 'Chennai'])
perCapita = AvgInc / Pop
print ("Population in four metro cities ")
print (Pop)
print ("AvgIncome in four metro cities")
print (AvgInc)
print("Per Capita Income in four metro cities ")
print(perCapita)
Output:
Population in four metro cities
Delhi 10927986
Mumbai 12691836
Kolkata 4631392
Chennai 4328063
dtype: int64
AvgIncome in four metro cities
Delhi 36360927986
Mumbai 252325355
Kolkata 4141577878
Chennai 34896899
dtype: int64
Per Capita Income in four metro cities
Delhi 3327.321977
Mumbai 19.880918
Kolkata 894.240409
Chennai 8.062937
dtype: float64
2.Write a program to create a DataFrame to store weight, age and names of 3 people. Print
DataFrame and its transpose.
import pandas as pd
df = pd.DataFrame({'Weight': [78, 45, 67],
'Name': ['sam','arun', 'ajay'],'Age' : [56, 42,34]})
print('Original Dataframe')
print(df)
print('Transpose:')
print(df.T)
Output:
Original Dataframe
Weight Name Age
0 78 sam 56
1 45 arun 42
2 67 ajay 34
Transpose:
0 1 2
Weight 78 45 67
Name sam arun ajay
Age 56 42 34
3.Consider the following dataframe saleDf:
Target Sales
zoneA 56000 58000
zoneB 70000 68000
zoneC 75000 78000
zoneD 60000 61000
Write a program to add a column namely Orders having values 6000, 6700, 6200 and 6000
respectively for the zones A, B. C and D. The program should also add a new row for a new
zone ZoneE. Add some dummy values in this row.
import pandas as pd
saleDf=pd.DataFrame({"Target":[56000,70000,75000,60000],"Sales":
[58000,68000,78000,61000]},index= ["zoneA","zoneB","zoneC","zoneD"])
saleDf['Orders'] = [6000, 6700, 6200, 6000]
saleDf.loc['zoneE', :]= [50000, 45000, 5000]
print(saleDf)
Output:
Target Sales
zoneA 56000 58000
zoneB 70000 68000
zoneC 75000 78000
zoneD 60000 61000
4.From the dtf5 used above, create another DataFrame and it must not contain the column
“Population” and the row Bangalore.
import pandas as pd
data = {
'hospitals': [150, 540, 100, 34],
'population':[601200, 671100, 621100, 67110]}
df=pd.DataFrame(data,index=['delhi','banglore','kolkata','chennai'])
print(df)
del df['population']
df2=df.drop(['banglore'])
print(df2)
Output:
Hospitals Schools
Delhi 556.0 8335.0
Mumbai 773.0 7263.0
Kolkata 293.0 7238.0
Chennai 489.0 2726.0
5. Given a Series that stores the area of some states in km². Write code to find out the
biggest and smallest areas from the given Series. Given series has been created like this :
Ser1 pd. Series ( [34567, 890, 450, 67892, 34677, 78902, 256711, 678291, 637632, 25723,
2367, 11789, 345, 256517])
import pandas as pd
ser1 = pd.Series( [34567, 890, 450, 67892, 34677, 78902,256711, 678291, 637632, 25723,
2367, 11789, 345, 256517])
print("Top 3 biggest areas are:")
print(ser1.sort_values().tail(3))
print("3 smallest areas are :")
print(ser1.sort_values().head(3))
Output:
import pandas as pd
import numpy as np
s1 = pd. Series(data = [10, 20,30,40,50], index = ['a', 'b', 'c', 'd', 'e'])
print("Original Data Series:")
print(s1)
s1= s1.reindex(index = ['b', 'c', 'd','a', 'e'])
print ("Data Series after changing the order of index:")
print(s1)
Output:
Program code:
import pandas as pd
s4= pd.Series([67000,56000,50000,52000])
print("Original Series object s4:")
print(s4)
s4[1:3]=8000
print("Series object s4 after changing value:")
print(s4)
Output:
Original Series object s4:
0 67000
1 56000
2 50000
3 52000
dtype: int64
Series object s4 after changing value:
0 67000
1 8000
2 8000
3 52000
dtype: int64
8. Given a Series object s5. Write a program to calculate the cubes of the Series values.
import pandas as pd
s= pd.Series([5,7,9])
print("series object s")
print(s)
print("Cubes of s values")
print(s**3)
Output:
series object s
0 5
1 7
2 9
dtype: int64
Cubes of s values
0 125
1 343
2 729
dtype: int64
9.Write a program to print the DataFrame df, one column at a time.
import pandas as pd
dict = { 'Name': ["Ram", "Pam", "Sam"],'Marks': [70, 95, 80]}
df = pd.DataFrame(dict, index = ['Rno.1', 'Rno.2', 'Rno.3'])
for i, j in df.iteritems():
print(j)
print("---------------------")
Output:
Rno.1 Ram
Rno.2 Pam
Rno.3 Sam
Name: Name, dtype: object
---------------------
Rno.1 70
Rno.2 95
Rno.3 80
Name: Marks, dtype: int64
10.Given a DataFrame dtf6.
Hospitals Schools
Delhi 267 7636
Mumbai 425 9776
Kolkata 375 2524
Chennai 274 1625
Write a program to display top two rows' values of 'Schools' column and last 3 values of
'Hospitals' column.
import pandas as pd
dtf6=pd.DataFrame({"hospital":[267,425,375,274],"schools":
[7636,9776,2524,1625]},index=["Delhi","Mumbai","Kolkata","Chennai"])
print(dtf6.Schools.head(2))
print(dtf6.Hospitals.tail(3))
Output:
Delhi 7636
Mumbai 9776
Name: schools, dtype: int64
Mumbai 425
Kolkata 375
Chennai 274
Name: hospital, dtype: int64
11.Given Dataframe df
Name Sex Position City Age Projects Budget
0 Rabita F Manager Bangalore 30 13 48
1 Evan M Programmer New delhi 27 17 13
2 Jia F Manager Chennai 32 16 32
3 Lalit M Manager Mumbai 40 20 21
4 Jaspreet M Programmer Chennai 28 21 17
5 suji F Programmer Bangalore 32 14 10
Write a program to print only the Name, Age and Position for all rows.
import pandas as pd
import numpy as np
df=pd.DataFrame({"Name":["Rabita","Evan","Jia","Lalit","Jaspreet","suji"],"Sex":
['F','M','F','M','M','F'],"Position":
['Manager','Programmer','Manager','Manager','Programmer','Programmer'],"City":
['Bangalore','New delhi','Chennai','Mumbai','Chennai','Bangalore'],"Age":
[30,27,32,40,28,32],"Projects":[13,17,16,20,21,14],"Budget":[48,13,32,21,17,10]})
for i, row in df.iterrows():
print(row['Name'], '\t', row["Age"], '\t', row['Position'])
Output:
Rabita 30 Manager
Evan 27 Programmer
Jia 32 Manager
Lalit 40 Manager
Jaspreet 28 Programmer
suji 32 Programmer
12. Given a series nfib that contains reversed Fibonacci numbers with Fibonacci numbers
as shown below:
[0, -1, -1, -2, -3, -5, -8, -13, -21, -34, 0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
Program code:
import matplotlib.pyplot as plt
import numpy as np
n= [0, -1, -1, -2, -3, -5, -8, -13, -21, -34, 0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
plt.plot(range(-10, 10), n, 'mo', markersize= 3, markeredgecolor = 'b',linestyle = 'solid')
plt.show()
Output:
13.Consider the reference table 3.1. Write a program to plot a bar chart from the
Medals won by Australia. In the same chart, plot medals won by India too.
Output:
15. A survey gathers height and weight of 100 participants and recorded the participants
ages as:
Ages = [1, 1,2,3,5,7,8,9,10, 10,11,13,13,15, 16,17,18, 19,20, 21, 21,23, 24,24, 24, 25,25,
25,25,26, 26, 26,27,27, 27, 27,27, 29, 30, 30, 30, 30, 31,33, 34, 34, 34, 35, 36, 36, 37, 37,
37,38, 38, 39, 40,40, 41, 41,42, 43,45,45,46,46, 46, 47,48,48,49, 50, 51,51, 52, 52, 53, 54,
55,56,57,58,60, 61,63,65,66,68, 70, 72,74, 75,77,81, 83, 84,87,89,90,91]
Write a program to plot a histogram from above data with 20 bins
mu =100
sigma 15
x= mu + sigma numpy.random.randn(10000)
y = mu + 30 np.random.randn(10000)
Write a program to plot this data on a cumulative bar-stacked horizontal histogram with
both x and y.
import numpy as np
import matplotlib.pyplot as plt
mu = 100
sigma =15
x= mu + sigma*np.random.randn(10000)
y=mu +30*np.random.randn(10000)
plt.hist([x,y],bins = 100,histtype='barstacked', cumulative=True)
plt.title('Histogram')
plt.show()
Output:
18. Write a program to create a horizontal bar chart from to data sequences as given
below:
means = [20, 35, 30, 35, 27], stds= [2, 3, 4, 1, 2]
Make sure to show legends.
Output:
19. Write a program to read from a CSV file Employee.csv and create a dataframe from it
but dataframe should not use file's column header rather should use own column
numbers as 0, 1, 2 and so on.
import pandas as pd
df = pd.read_csv('Employee.csv', header=None)
print(df)
Output:
0 1 2 3
0 1001 trupti manager 5663665
1 1002 sam manger 46436
2 1003 pam ca 3634666
3 1004 arun clerk 252363
4 1005 shreya clerk 25346666
20.Write a program to read from a CSV file Employee.csv and create a dataframe from it
but dataframe should not use file's column header rather should use own column
headings as EmpID, EmpName, Designation and Salary. Also print the maximum salary
given to an employee
import pandas as pd
df = pd.read_csv('Employee.csv', header=None, names=['EmpID', 'EmpName', 'Designation',
'Salary'])
Output:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv(‘C:/Users/Arjun/OneDrive/Documents/sport.csv’)
plt.plot(df['Sport'], df['Competitions'], marker='o')
plt.title('Competitions vs Sport')
plt.xlabel('Sport')
plt.ylabel('Competitions')
plt.xticks(rotation=45)
plt.show()
Output:
22. Previous examples 9 and 10 created csv files without NaN values. Write a program to
store the data of allDf dataframe in a csv file along with NaN values stored as Null and
separator as’~’
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', None],
'Age': [25, 30, None, 22],
'Salary': [50000, None, 60000, 45000]
}
allDf = pd.DataFrame(data)
allDf.to_csv('output_tilde.csv', sep='~', na_rep='Null', index=False)
print("CSV file with '~' separator has been created as 'output.csv'")
Output:
23.Write modified program of example 10. Take marks from user and fetch those records
which have marks more than input marks.
import pandas as pd
import mysql.connector as sqltor
mycon = sqltor.connect(host = "localhost", user = "root", passwd = "MyPass", database =
"test")
if mycon.is_connected():
mks = float(input("Enter marks :"))
qry = "Select * from student where marks > %s ; " % (mks,)
mdf = pd.read_sql( qry, mycon)
print("Student details with marks >", mks)
print(mdf)
else:
print("MySQL Connection problem")
Output:
Enter marks : 75
Student details with marks > 75.0
RollNo Name Marks Grade Section Project
0 103 Simran 81.2 A B Evaluated
1 106 Arsiya 91.6 A+ B submitted
24. Write a program in SQL to take the marks range from the user i.e., lower and upper
limit of the marks range and fetch those records from the student table having marks in
this range.
import pandas as pd
import mysql.connector as sqltor
mycon = sqltor.connect(host="localhost", user="root", passwd="MyPass", database="test")
if mycon.is_connected():
lmks = float(input("Enter lower limit of marks range: "))
hmks = float(input("Enter higher limit of marks range: "))
qry = "Select * from student where marks between %s and %s;" % (lmks, hmks)
mdf = pd.read_sql(qry, mycon)
print("Student details with marks in the range (", lmks, "-", hmks, "): ")
print(mdf)
else:
print("MySQL Connection problem")
import pandas as pd
from sqlalchemy import create_engine
import pymysql
Output:
27. Write a program to write only the top 4 rows of the dataframe allDf used in previous
example, in sales2 of test database on MySQL. If the table exists, then the records should
get appended to the table.
import pandas as pd
from sqlalchemy import create_engine
import pymysql