100% found this document useful (1 vote)
123 views

2023 Data Analysis and Visualization Using Python

lalala

Uploaded by

dgod975
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
100% found this document useful (1 vote)
123 views

2023 Data Analysis and Visualization Using Python

lalala

Uploaded by

dgod975
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 9
BM s+br yf ‘ (Evening 13 [This question paper contams 16 printed pages.] a ye Your Roll No.... Sr. No. of Question Paper: 2012 F Unique Paper Code . 2344001201 Name of the Paper : Data Analysis and Visualization Using Python Name of the Course Computer Science: Generic Elective (G.E.) Semester ¢ i Duration : 3 Hours Maximum Marks ; 90 Instructions for Candidates 1. Write your Roll No. on the top immediately on receipt of this question paper. 2. This question paper has two sections A and B. 3. Question 1 in Section A is compulsory. 4. Attempt any 4 questions from Section B. 5. Parts of a question must be attempted together. 6. Section A carries 30 marks and each question in Section B carries 15 marks. 7. Use of Calculator is not allowed. PeP 0) 2012 2: Section A Assume numpy has been imported as np and pandas has been imported as pd. 1. (a) Explain unimodal, bimodal and multimodal distribution with the help of examples. (5) (b) Consider the DataFrames First and Second given below : (5) One 4 Two One Two 0 "At 0 TBr Z “Bt i Tey 5 ep mS TE! 6 a" 2 "At First Second Consider the following python code segment : right = pd.merge(first, second, how='right’, on='One’) left = pd.merge(first, second, how='inner', on="'Two') Show the content of the new DataFrames right and left. 2012 3 (c) Write python commands to create a figure object using matplotlib. The Figure object has one subplot that contains 3 line graphs. Define legend and chart title of the graph. Define a different style and colour for each line in the subplot. Import appropriate libraries. (5) (d) List and describe the steps involved in process of Data Analysis. (5) (c) Give the output of the following code snippets: (4) (i) y=np. arange(12).reshape(4,3) print(y) vive) =-1 print(y) (ii) x = np.array ([[2, 4], [5,1]]) z-np.ones_like(x) print(z) w=np.cye(2) * x print(w) PT .0, 2012 4 , 2012 (f) Consider the series $1 and 82 given below: (6) 51 82 A 1 A 3 B 2 B 6 ec 3 D 7 D 4 E 8 Give the output of the following python pandas commands : Gi) S1[: 3] * 10 Gi) $1 + $2 (iii) $2 [2s -1]*5 Section B 2. (a) Consider the DataFrame Frame given below: (7) Name Age Weight | Height Ram 45 45.0 { 140 Ravi 2s 34.9 | 160 Reena 32 45.6 | 145 Rita 20 60.7 155. Rishi 33 54.7 170 Romi 21 34.6 144 5 Write python commands to perform the following operations : G) Compute the correlation of Age with both Weight and Height. (ii) Sort Frame in descending order of Age. (iii) To find the index for the row with minimum Age. (iv) Calculate cumulative sum for Weight for all Students. (v) To set height of ‘Rita’ and ‘Romi’ to NA. (vi) Replace the value 32 with 18 and 33 with 19 in Age column. (vii) Define map function to convert values of Name column to upper case. PEO. 2012 6 (b) Refer to the DataFrame Frame given in question 2 (a), Write a python program to perform the following operations in the given dataset with columns Name, Age, Weight, Height (8) (i) Create a figure and include 2 subplots in it. (ii) In the first subplot create a scatter plot between two variables Age and Height. (iii) In the second subplot draw a horizontal bar plot between Name and Weight. (iv) Set the title for the figure as ‘Data Analysis’. (v) Give appropriate labels for x and y axis. (vi) Save the figure to file with name ‘analysis.png’. 2012 oT: “3. (a) Consider the following numpy array matrix : (10) [[5,10,20], (20,13,43], [34,27,67], [12,46,77]] Give the output of the following numpy commands : (i) matrix.T (ii) matrix[:1,1:] iii), matrix[{1,3,0].[2,1.0]] (iv) matrix[[—2,-4]] (v) matrix[[True, False, False, True]] (vi) matrix[3] [:2] (vii) matrix[::-1] 2012 (viii) matrix.ndim (ix) np.swapaxes(matrix, 1, 0) (x) matrixt+10 (b) Consider the following DataFrame df. [Items "Sugar Type Price Yogurt Low Fat. 45 Chips "Regular 30 Soda Low Pat 50 Yogurt | High Fat 70 Cake | Regular 140 Chips Low Fat. 40 Yogurt _ Regular 50 Give commands to perform the following operations: (i) List the name of unique items sold. (ii) Count the number of times each value in items is stored. (iii) Delete the rows which have duplicate values of Items. 2012 9 (iv) Give the average price of all Low Fat items. (v) Check if ‘Juice’ ims one of the items sold. 4. (a) Consider the DataFrame data given below. (4) One Two | Three | Four | Five 1 14 34 NaN [NaN 34 21 NaN 12 | NaN NaN 23 Nan | 2 NaN 34 21 32 33 | NaN Write python commands to perform the following operations : G) Drop columns with any null values. (ii) Replace the null values with the mean of each column. Gii) Drop the null values where there are at least 2 null values in a row. PG. 2012 10 (iv) Replace all null values by the last known valid observation. (b) What are outliers? How can you detect outliers using boxplots? (5) (c) Consider the given numpy array mat: (6) mat ~ np.array({[[-1,2], [3.4]. [[-S,6]. [7,8]]) Write numpy commands to perform the following operations : (i) Create an array of zeros with the same shape as mat. Gi) Print the shape of the mat. Gii) Print the datatype of the elements in mat. (iv) Print the elements which are greater than 6 in mat. (v) Convert all the elements in mat as float type. 2012 il (vi) Multiply each element in mat with 25. 5, (a) Give the python commands to create a dictionary with 5 keys — ‘A’, ‘B’, ‘C’, ‘D’, *E” and value as follows. (10) List ofnumbers from 1 to 10 skipping 2 ata time. List of Strings from A to B. List of 5 numbers obtained using random normal distribution function. _ D List of 5 random integers from 20 to 30. E Square root of 5 random numbers from 50 to 70. Key Value A B c Give python commands to perform the following operations : (i) Create DataFrame data using the above dictionary. (ii) Convert Column A to index. (iii) Rename the rest of the columns as Area, Temperature, Latitude and Longitude. 2012 12 2012 13 (iv) Delete the column Longitude from data. (ii) student [student ['Age'] >20] (v) Save data as a csv with separator as * (iii) student [student ['Age'] >20] ['Name'] (b) Write a python code to create a figure and a (iv) avg_marks = np.mean (student.Marks) subplot using matplotlib functions. Plot a rectangle student[student["Marks']>avg_marks] of size 3.5 x 8.5 at point (2.0, 7.0), a circle of tadius 2.5 at point (7.0,2.0) as patches in the (v) first = student [student ["Ycar'] -—1]['Marks'] subplot, functions for plotting. Set the colour of np.mean(first) rectangle as ‘Green’ and color of circle as ‘Blue’. Set the x-scale and y-scale to 1-10. Import (b) Consider the following list 11. (5) appropriate libraries. (5) 11 = [10, 10, 20, 40, 50, 60, 70, 80, 90, 90] 6. (a) Consider the following dataset student. (10) Diseretise the 11 into 4 bins using cut() and qeut(). Give the names [‘first’, ‘second’, ‘third’, ‘fourth’] Year | Name | Roll No| Marks Age . ie aa 7 Rani 23 70 18 to the bins. What type of object is returned by the 2 Rita 24 75 20 ‘ fs ¥ . 3 Raj 25 80 De pandas after binning? What output is gencrated 1 Rahul 2 es . , a Ps eee $F se a by attributes codes and categories of binning object? Give the output of the following python commands : @ student [['Roll No '," Name']] [2:4] PO. 2012 14 We, 2012 15 (b) Give the output of the following code segment : 7. (a) Consider the DataFrame df given below : (8) : (4) EmployeeID | Department | Salary . [1001 English 1000 arr — np.array([89, 54, 76, 32, 47, 21, 92, 39, 82]) 1002 English 1002 1003. English 1004 anes 1004 ‘English 1005 arr] = art[5:9] 1003 Maths 1004 1004 Maths 1005 arr2 = arr[5:9].copy() 1002 1006 1002 Maths 1002 arrl = 36 Write the python code to perform the following arr2 = 7 operations ; . print(arr) (i) Create a hierarchical index on Department print(arrl) and Employee ID. print(arr2) Gi) Give the summary level statistics for each column. (c) Consider the series a given below and give the output of the following commands: (3) (iii) Give the output for the following : a = pd.Series([4, 1, 7,1, 8, 9,0, 8, 2,3, 9) 1. df.stack() (i) a.rank() 2. dfunstack() B.TG: 2012 16 (ii) arank(method = ‘first’) (iii) a.rank(ascending = False) (1500)

You might also like