0% found this document useful (0 votes)
85 views12 pages

Pyq Solution

The document discusses various data analysis and visualization concepts and techniques. It provides examples of pandas code for data manipulation and aggregation. The document also contains questions related to dataframes, groupby operations, merging and other pandas functionality.

Uploaded by

Shiv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views12 pages

Pyq Solution

The document discusses various data analysis and visualization concepts and techniques. It provides examples of pandas code for data manipulation and aggregation. The document also contains questions related to dataframes, groupby operations, merging and other pandas functionality.

Uploaded by

Shiv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Unique Paper Code: 32347507

Name of the paper: Data Analysis and Visualisation


Name of the Course: B.Sc. (Hons.)Computer Science
Semester: V

Duration: 3 Hours Maximum Marks: 75

Question No. 1 is compulsory.


Attempt any fourquestions out of Q. 2 toQ.7.
Parts ofa question must be answered together

Ql a) Give output of the following code.


i. import pandas as pd
obj3 = pd.Series ([' wow', 'good', 'great'], (2)
index=[0, 2, 4])
obj3.reindex (range (6) , method='ffill ')
obj3

output (2)
WOW
1 WOW
2 good
good
great (2)
5 qreat
dtype: object

ii. matrix =[Ej for j in range (3) ]for i in range (3) )


print (matrix)

output
[[0, 1, 2], [0, 1, 2], (0, 1, 2]]

iii. import pandas as pd


df-pd. DataFrame[[1, 1, 1], [2,2,2],[1, 2, 1],
[2,1,1]], index=['one', 'two', 'three', 'four'],
columns=pd. Index (['A', B', 'C'l, name="MyPlot'))
Give the output for df.plot.bar().
Output
AxesSubplot (0.125,0.125;0. 775x0.755)
MyPot
2 00

175 R

150

125

l00

0 75

0 50

025
O 00

b What is a pivot table? Give one example. (2)


A pivot table is a data summarization tool freguently found in
spreadsheet
programs and other data analysis software. It aggregates a table of data by one
or more keys, arranging the data in arectangle with some of
the group keys along
the rows and some along the columns. Pivot tables in
Python with pandas are
made possible through the groupby facility combined with
utilizing hierarchical indexing. reshapeopera-tions
For example tips.pivot_table(index=['day,'smoker|])
c) Provide the output of following codes.
Given the value of string object s-3.1456 and (3)
c=This is a long string
that spans multiple lines""
i. fval= float (s)
type (fval)
Output
float
ii boo1 (s)
Output
true
ii. C.count ("\n')
Output
d) Consider a list seq= [1, 2, 0, 4, 6, 5, 2, 1]. Write a code to find the sum of (2)
elements of the value till element 5.
Answer
sequence = [1, 2, 0, 4, 6, 5, 2, 1]
total until 5 =0
for value in sequence:
if value 5:
break
total until 5 += value
print(total until S) 3)
Consider the given ar |1,2,8,9,3,4,7,5, 10,6|. What will be the resulting
arr-5: -1| and arr[::2.
array if theseoperations are perforned arr2:5|,
AnswCr
1. |8, 9, 31
14, 7, 5, 101
iii. (1, 8, 3, 7, 10]

Create a dataframe with four rows and threc columns and populate it with (3)
random values. Index of the rows arc "Utah', 'Ohio', "Texas', 'Oregon' and
column indexes are 'b','d''e'. Write a lambda function to compute the
column.
difference betwecn the maximum and minimum of cach

Answer

import numpy as np
3),columns=list('bde'),index-['Utah',
frame = pd.DataFrame(np.random.randn4,
"Ohio', 'Texas', 'Oregon'])

f= lambda x: x.max() - x.min()


frame.apply()
Create an array num of size 2 x3 filled with all zeros then insert [[1,2,31.
(3)
[4,5,6]] into array. Identify the shape of the array num.

a =[y2,33, C4, s,61]


Answer
ar-np.zeros((2, 3)),
ar.shape(3,2) (3)
and line terminator
h) Write a code to read a CSVfile with new delimiter as :'
as \n'.

Answer
class my dialect(csv.Dialect):
lineterminator -n'
delimiter =:"
quotechar
quoting = csv.QUOTE MINIMAL
reader = cSv.reader(f, dialectmy dialect)
Consider following piece of code and give the output. (3)
import pandas as pd
9, 10],
a = pd. DataFrame ({'id': [1, 2,
'val': ('a', 'b', 'c', 'd']})
12,13, 71,
b = pd. DataFrame ({" id': [1, 7, 10,
'val': 'p', 'g' 'r', 's', 't', 'u'})
pd.merge (a, b, on='id', how='right')
c =

i. How many NaN' values are in the dataframe 'c?


val x val y
id
a
0 1
17 NaN
2 10
3 12 NaN
NaN t
4 13
5 7 NaN

dataframe b' and keep the last


ii. Drop duplicate values from
duplicated value.

Answer
b.drop_ duplicates( keep-last')
index will be Tuesday of (3)
Generate DateTimelndex of length 20 where each
the third week of a month starting from 10-Jan-2022.

Answer
import pandas as pd
dates-pd.date_ range(2020-01-10', periods-20, freq-"WOM-3TUE')
dates

k Consider dataframe df (4)


import pandas as pd
import numpy as np
df pd. DataFrame({'key': ['a' 'b', 'c'] 4,
value': np.arange (12.0) })

What will be the output of the following statements?


i Print the dataframe df.
df = pd. DataFrame ({ 'key': ['a', 'b', 'c'l * 4,
'value': np.arange (12.0) })
df
ii. Write a code to group the dataframe using key.
g = df.groupby('key').value
print (g)
iii. Multiply each group value by 2.
g.transform (lambda x: x * 2)

Q2 a) Consider a dataframe df as (6)


import pandas as pd
import nmpy as np
df = pd. DataFrame ({'keyl ' : ['a', 'a', 'b', 'b', 'a',
'key2': ['one', ' two', 'one', 'two','one'],
' datal':np.random. randn (5),
' data2' :np. random. randn (5) })
Provide the output for the following:
i print (df)
key1 key2 data1 data2

one 2 051693 -2 432268


1 two 0196488 -0 134805
2 b one 1690703 -1340778

3 wo -0.283880 -1261686

A one -1.771815 -1.581653

ii. ml = df(' datal'].groupby ([df ['keyl'],


df['key2']]).mean()
print (ml)
key1 key2
one 0.139939
0.196488
b one 1.690703
two -0.283880
Name: datal, dtype: float64
iii. m2 = df[' datal"].groupby ([df[' keyl 'J).mean ()
key1
B.158789
e.703411
Name: datal, dtype: float64

lv. pieces = dict(iist (df.groupby (' keyi')))


pieces[ 'b']
key1 key2 data1 data2

2 b one 1.690703 -1.340778

b two -0.283880 -1.261686

V. for(kl, k2),group in
df.groupby ([' keyl', 'key2']):
print ((kl, k2) )
print (group)
('a', 'one')
key1 key2 datal data2
one 2.051693 -2.432268
4 one -1. 771815 -1.581653
('a', 'two')
key1 key2 datal data2
a two 0.196488 -0.134805
('b', 'one')
key1 key2 datal data2
2 b one 1.690703 -1.340778
('b', 'two')
key1 key2 data1 data2
b two -0.28388 -1.261686
b) Give output of the following code. Justify.
i. val=[ foo' , 2, [4,2]]
val [2]=(5,4) (2)
print (val)
output
["foo', 2, (5, 4) ]
(2)
ii. var=(3, 5, (4,5) )
var [l]=' two
print (var)

TYpeError Traceback (most recent call last)


<ipython-input-15-36f8£7bcl575> in
1 var=(3, 5, (4,5))
----> 2 var[1]=two
3 print (var)

TypeError: 'tuple' object does not support item assignment


Q3 a) Given the following list of strings
(5)
Listl = 'Amazon', 'Amazing Amazon', 'Apple', 'Microsoft', 'Apple is good
for health', 'I like Microsoft'].
Using 'List1', generate the following dictionary 'Anydict where key is the
count of words in a string and value is the list of strings having that count.
Anydict-{1:['Amazon', 'Apple, 'Microsoft , 2: ('Amazing Amazon'], 3: [I
like Microsoft], 4: ['Apple is good for health']}.
Answer
Listl='Amazon'; Amazing Amazon';Apple', Microsoft!,' Apple is good for
health''I like Microsoft']
Anydict-{!
for i,v in enumerate( List 1):
Flen(v.split( )
iflnot in Anydict:
Anydict[l|-[v]
clse:
Anydict|!) append(v)
Anydiet
spd.Series( Anydict)
S
and (5)
b) Write a code to rcad the data from a csv file. Find the number of rows
columns in the data, replace missing values with zero, and remove duplicate
values. Write the modificd data back to the original file.

Answer
df- pd.read csv('examples/exI.csv')
df.fillna(0)
df.drop_ duplicates()
df.to csv('examples/ex1 .csv')

Q4 a) What is the use of generator function? Write a generator function to print (4)
square of first n natural numbers where n is user input.

Answer
def square of sequence(x):
for iin range(x):
yield ii

squres = Square_ of sequence(5)


for sqr in squres:
print(sqr)

comparing mnarks of (6)


b) Write a code program to draw a scatter plot
Mathematics= [88, 92, 80, 89, 100, 80, 60, 100, 80, 34] and Science = [35,
79, 79, 48, 100, 88, 32, 45, 20, 30] subjects.
Import the necessary libraries.
Title the plot as 'Marks Comparison' and label y-axis as Marks Scored'.
mathematics marks points and blue color to science marks
Assign red color to
points.
Answer
import matplotlib.pyplot as plt
import pandas as pd
100, 80, 34]
math marks = [88, 92, 80, 89, 100, 80, 60,
45,20, 301
science marks [35, 79, 79, 48, 100, 88,32, 90, 1001
70, 80,
marks range = [10, 20, 30, 40, 50, 60, label-Math marks', color)
plt.scatter(marks range, math marks,
marks', color-g')
plt.scatter(marks range, science marks, labelScience
plt.title(Marks Comparison')
plt.xlabel('Marks Scored')
plt.legend)
plt.show)
Q5 a) Consider the following data frame Family containing a family nane, gender
of the family member and her/his monthly incomc and cxpcnditurc in cach
record.

Name Gender Monthly Income Expenditure


Shahin Male 114000.00 58000.00
Vimal Male 65000.00 32000.00
Vimala Fema le 69500.00 38500.00
Vima la Female 155000.00 70000.00
Karan Male 103000.00 52000.00
Shahin Male 55000.00 18000.00
Seema Female 112400.00 60000.00
Seema Female 81030.00 25000.00
Vimal Male 71900.00 30000.00
i. Findcorrelation between Monthly Income and Expenditure. ()
data['Monthly Income'].corr(data|'Expenditure'|)
Use map function to convert each value of Name into
uppercase. ()
transform = lambda x: x.upper)
da-data.Name.map(transform)
da
ii. Createa new data frame Info having a hierarchical index on
columns Name and Gender.
(2)
Info-data.set index('Name' 'Gender'])
Info
b) Consider the data array= [0.9296, 0.3164, 0.1839, 0.2046, 0.5677,
0.5955,
0.9645, 0.6532,0.7489, 0.6536] of 10 floating-point values. Write code for
following:
Create 5 bins of the array using the cut method.
(1)
import pandas as pd
arr= [0.9296, 0.3164, 0.1839, 0.2046, 0.5677, 0.5955, 0.9645.
0.6532, 0.7489, 0.6536]
pd.cut(arr,5)
ii Create 5 bins of the array using the qcut method. (1)
import pandas as pd
arr- (0.9296, 0.3164, 0.1839, 0.2046, 0.5677, 0.5955, 0.9645,
0.6532, 0.7489, 0.6536]
pd.qcut(ar,5)
i. Create 5 bins of the array with precision 2 using cut method. (3)
Also explain the usage of parameter precision.
import pandas as pd
arr- [0.9296, 0.3164, 0.1839, 0.2046, 0.5677, 0.5955, 0.9645,
0.6532, 0.7489, 0.6536]
pd.cut(arr,5, precision 2)

Q6 a) Consider the following code:


import pandas as pd
"foo','bar',
left = pd. DataFrame (|'keyl':|'Loo',"lval':|1, 2, 3| ))
'key2':['one ', 'two', 'onel,
' Eoo', foo', ' bar',
right = pd. DataFrame (('keyl ':[
'one', 'one', 'two'I,
bar'], ' key2':|'one',
'rval': [4,5,6,7]))

Provide output of the following:


pd.merge (left, right, on=[' keyl') (2)
i.

key1 key2_x lval key2_y rval


1 one 4
foo one

1 one 5
1 foo one

2 one 4
2 foo two

3 foo two 2 one

3 one 6
bar one

3 two 7
5 bar one

key2,
ii. prop cumsum -left. sort values (by='
ascending=False).lval.cumsum ()
(2)

print (prop cumsum)

3
2 6
Name: lval, dtype: int64
(2)
ii. left.append (right)
key1 key2 lval rval

0 foo one 10 NaN

1 foo two 20 NaN

2 bar one 3.0 NaN

foo one NaN 4.0

1 foo one NaN 5.0

2 bar one NaN 6.0


3 bar two NaN 7.0

b) Consider a datagiven below:


EMP ID EMP NAME SALARY
Satish 5000
2 Vani 7500
3 Ramesh 10000
4 Rajesh 8000
Virat 9500

Write a code for the following:


Create a dataframe for the above data. (2)

import pandas as pd
Em = pd.DataFrame( {'Em ID': (1,2,3,4,5],
'Em Name':
[Satesh','Vani''Ramesh,'Rajesh',"Virat'],
'Salary': [5000,7500,10000,8000,9500})
Em
Em_ID Em_Name Salary
Satesh 5000

2 Vani 7500

2 3 Ramesh 10000

3 4 Rajesh 8000

4 Virat 9500

Print elements of 2nd to 4th column of 3d to 5th row. (1)

Em.iloc[3:5,2:4]
ii. Print elements of all the columns for first two rows. ()

Em.iloc[:2,]
07 a) Consider the code given below:
import pandas as pd
from datetime import datetie
dates =|datetime (2011,1,2) , datetime (2011,1,5),
dateti me (2011, 1,7) , datetime (2011,1,8),
datetime (201l, 1, 10) , datet ime (2011,1, 12)]
ts = pd. Series (np. random. randn (6), index-dates)

Provide output for the following code:


i. print (ts) (1)
2011-01-02 -0.510303 (1)
2011-01-05 0.466675 (1)
2011-01-07 -2.073346
2011 -01-08 -1.415322
2011-01-10 0.290394
2011-01-12 -1.828824
dtype: float64
ii. print (ts + ts [::-1])
2011-01-02 -1.020607
2011-01-05 O.933350
2011-01-07 -4.146693
2011-01-08 -2.830643
2011-01-10 0.580787
2011-1-12 -3.657648
dtype: float64
iii. print (ts. index [0]}

2011-01 -02 00: 00:00

b) Write a code to convert string of date 2022-10-20" to string of date (3)


20/10/2022'.
import pandas as pd
st-"2022-10-20
from datetime import datetime
datetime.strptime(st,"%Y-%m-%d).strftime(%d/%m/%y)
c) Provide output of the following code: (4)
rng=pd. date_ range ( '2010-01-01' ,periods=12, freg=
T)
ts= pd. Series (n. arange (12), indexing=rng)
print (ts)
print (ts. resample ( '5min' , closed= 'right').sun() )
print (ts . resample ('5min', closed= right', label=
'right', loffeset= l-ls') .sum () )
print (ts.resample (5min') .ohlc () )
AnsweI
2010-01-01 00:00:00
2010-0l-01 00:01:00 1
2010-01-01 00:02:00
2010-0l-01 00:03:00 3
2010-01-01 00: 04:00 4
2010-01-01 00:05:00 5
2010-01-01 00:06:00 6
2010-01-01 00:07:00 7
2010-01-01 00:08:00
2010-01-01 00:09:00
2010-01-01 00:10:00 10
2010-01-01 00:11:00 11
Freq: T, dt ype: int32
2009-12-31 23:55:00
2010-01-01 00:00: 00 15
2010-01-01 00:05:00 40
2010-01-01 00: 10:00 11
Freq: 5T, dtype: int32
2009-12-31 23:59:59
2010-01-01 00:04:59 15
2010-01-01 00:09:59 40
2010-01-01 00:14:59 11
Freq: 5T, dtype: int32
open high low close
2010-01-01 00:00: 00 4 4
2010-01-01 00:05:00 9 5 9
2010-01-01 00:10 : 00 l0 10 11

You might also like