[This question paper contains 12 printed pages.
]
Your Roll No...s..
Sr. No. of Question Paper : 1095 C
Unique Paper Code : 32347507
Name of the Paper : Data Analysis and Visualisation
Name of the Course : B.Sc. (Hons.) Computer
Science
Semester
Duration: 3 Hours Maximum Marks : 75
Instructions for Candidates
1. Write your RollNo. on the top immediately on receipt
of this question paper.
2. Question No. 1 is compulsory.
3. Attempt any four questions out of Q. 2 to Q.7.
4. Parts of a question must be answered together.
1. (a) Give output of the following code.
(i) import pandas as pd (2)
P.T.O.
1095
(c) (b)
c=4This GivenProvide What Give (iü)
float(s)
fval=(i) (i1)
type(fval) that import matrix
columns=pd.Indéx(['A,B}'C|],name='MyPlot)
the df-pd.DataFrame[[1,1,1],
is he
(2,1,1]],index=['one','two',
t'three','four], print(matrix)
obj3.reindex(range(6),
method='obj3ffill") obj3
spans is the a output
avalue output pivot =
long pandas =[Ü pd.Series(['
index=[0,w4])2,
ow' ,
multiple
of table? for forj
string string of as 2
following df.plot. in
lines Give pd
range(3)]for
object
one bar().
[2,2,2], 'good',
s=3.1456 codes. example.
[1,2, ini 'great],
range(3)]
and 1],
(3) (2) (2) (2)
1095
(i) (h) (g) (t) (e) (d)
output.
Consider as
Write Identify zeros Create maximum function and columns Create
of performed Consider will element 5.
code
Considerc.count(\n')(i)
the
bool(s)(ii)
: column be to
and a then an
code rows and a the find
the and to dataframe the a
following line array arr[2:5], list
insert compute indexes populate resulting given the
to shape are
minimum
terminator read sum seq=
num "Utah', arr
of
a [[1,2,3], are arr[-5:array of [1, 3
piece CSV of the itwith =
the 'b', with [1,2,8,9,3,4,7,5,1 elements 2,
size difference of 'Ohio',
as if 0,
of file array each 'd', random four -1]
these 4,
\n'. [4,5,6]] 2x3
' e'.
code with and 6,
num. column. Texas', rows of 5,
filled Write operations
and new values. arr[::2]. the 2,
between
into and 0,6]. 1].
a value
give delimiter with 'Oregon'
lambda Write
P.T.O. array. Indexthree What
(3) the (3) (3) all (3) the (3) are (2) till
a
1095 4 1095 5
import pandas as pd (i) Print the dataframe df.
a = pd.DataFrame( {'id': [1, 2, 9, 10], (ii) Write a code to group the dataframe
using
key.
'val': ['a', "b, 'c, 'd})
(iii) Multiply each group value by 2.
b= pd.Data Frame({'id: (1, 7, 10, 12, 13, 7],
'val': ['p', 'q, 'r, 's', 't', 'u]})
2. (a) Consider a dataframe df as
c= pd.merge(a, b, on='id', how='right') (6)
import pandas as pd
(i) How many 'NaN' values are in the dataframe import numpy as np
'c'?
df = pd.Data Frame({'keyl': ['a', 'a', "b', "'b', 'a'],
(ii) Drop duplicate values from dataframe 'b' and "key2": ['one', 'two', 'one', 'two', 'one'],
keep the last duplicated value. "'datal': np.random.randn(5),
(i) Generate DateTimelndex of length 20 where each
'data2': np.random.randn(5)})
index will be Tuesday of the third week of a month Provide the output for the following :
starting from 10-Jan-2022. (3)
(i) print(df)
(k) Consider dataframe df (4)
(i) ml =
import pandas as pd df['data1"]-groupby([df['key1'"],
df['key2']).mean()
import numpy as np print(ml )
df = pd.DataFrame({'key': ['a', "b', 'c] * 4,
(ii) m2 = df['datal']-groupby
'value': np.arange(12.0)}) ([df['key1'"|]).mean()
(iv) pieces =
What will be the output of the following dict(list(df.groupby('key1 ')))
statements?
pieces['b']
P.T.0.
6
1095 1095 7
(v) for(kl,k2),group in
count. Anydict={1:('Amazon', 'Apple', 'Microsoft],
df.groupby((keyl", key2"D: 2: ['Amazing Amazon'], 3: [I like Microsoft'],
print (kl, k2)) 4: ('Apple is good for health']} .
print(group) (b) Write a code to read the data from a csv file.
Find the number of rows and columns in the data,
code. Justify.
(b)Give output of the following replace missing values with zero, and remove
duplicate values. Write the modified data back to
() val=['foo', 2, [4,2]] (2)
the original file. (5)
val[2] = ($, 4)
print(val) 4. (a) What is the use of generator function? Write a
generator function to print square of first n natural
(ii) var=(3, 5, (4,5)) (2) numbers where n is user input. (4)
var[1] = 'two'
(b) Write a code program to draw a
print(var) scatter plot
comparing marks of Mathematics= [88, 92, 80, 89,
100, 80, 60, 100, 80, 34] and Science = [35, 79,
3. (a) Given the following list of strings (5) 79, 48, 100, 88, 32, 45, 20, 30] subjects.
List1 =['Amazon', 'Amazing Amazon', 'Apple', Import the necessary libraries.
'Microsoft', 'Apple is good for health', 'I like
Title the plot as Marks Comparison' and label
Microsoft].
y-axis as 'Marks Scored'.
Using List1', generate the following dictionary Assign red color to mathematics marks points and
'Anydict' where key is the count of words in a blue color to science marks points.
string and value is the list of strings having that (6)
P.T.0.
8
1095 1095 9
frame
(a) Consider
the following data Family (b) Consider the data array = [0.9296, 0.3164, 0.1839,
S. name, gender of the family
containing a family 0.2046, 0.5677, 0.5955, 0.9645, 0.6532, 0.7489,
monthly incomne and
member and her/his
record. 0.6536] of 10 floating-point values. Write code
expenditure in each
for following:
Gender
Monthly Income Expenditure
Name (i) Create 5 bins of the array using the cut
114000.00 58000.00
Shahin Male method.
(1)
65000.00 32000.00
Vimal Male
38500.00
(iü) Create 5 bins of the array using the qcut
Female 69500.00
Vimala method.
70000.00 (1)
Female 155000.00
Vimala
103000.00 52000.00 (iii) Create 5 bins of the array withprecision = 2
Karan Male
using cut method. Also explain the usage of
55000.00 18000.00
Shahin Male
parameter precision. (3)
Female 112400.00 60000.00
Seema
81030.00 25000.00
Seema Female 6. (a) Consider the following code :
71900.00 30000.00
Vimal Male
import pandas as pd
(i) Find correlation between Monthly Income left =
and Expenditure. (1) pd.DataFrame(('keyl': ['foo', 'foo', "bar'],
"key2': ['one', 'two', 'one'], 'lval': [1,2,3]} )
(ii) Use map function to convert each value of
right =
Name into uppercase. (2) pd.DataFrame(('key1':['foo', 'foo', 'bar',
'bar'], 'key2': ['one', 'one', 'one', 'two],
(iii) Create a new data frame Info having a
hierarchical inaex on columns Name and 'rval': [4,5,6,7] })
Gender. (2)
P.T.0.
first(1) (1) (1)
index=dates)(1)
(np.random.randn(6), (3) (4) rng-pd.date_range(2
2022-10 -01',periods=12,freq
P.T.O.
011,1,5), datetime(2011,1,10),datetime(2011,1,12)]
for datetime(2011,1,7),datetime
(2011,1,8),
columns
[datetime(2011,1,2),datetime(2
code: date code:
.
of 20/10/2022'
the :below following following
datetime string
all
of given
11 elements the [0]) convert
ts[::-1])
pd
as import
pandas
code date the
two
rows.
for index of
to
Print the datetime =
ts
pd.Series (ts) + (ts. codeof
output(i)
print output
print(ts string
Consider print
(iiil) import = Provide a Provide
fromdates Writeto
(ii) (iii) 20'
1095 (a)
(b) (c)
7.
(2) (2) (2) (2) 3rO (1)
(by='key2, data.
of
SALARY column
above
'"]) key val.cumsum()
lon=[' lascending=False). 5000 7500 100008000 9500
following: values the 4th
cumsum=left.sort_ belowNAME
:following
for to
right, print(prop_cumsum)
: EMP
Ramesh
dataframe
2nd
of
10 the (right)
left.append
(iü) given Satish Rajesh elements
(left, Vani Virat the
of
output pd.merge data for row.
a
code Create Print5n
to
prop_ aConsider
ID
EMP
Provide a
(11)
() Write (i)
(ii)
2 3 4 5
1095 (b)
(1500)
ts.resample('5min').ohlc0)
sum()) '-1s'), loffeset= 'right',
'right', closed=
label= print(ts.resample('5min',
t').sum()) print(ts.resample('5min',
closed=
print(ts)
indexing=rng) arange(12), pd.Series(np. ts=
12 1095