0% found this document useful (0 votes)
575 views

Python - DataScience Question - Paper

The document contains questions related to pandas data structures and operations. It includes multiple choice questions and a coding exercise on grouping and sorting a dataframe. The coding exercise asks to group a dataframe by system and speed per day, calculate the median speed for each group, add a column with the name 'Median', and sort the dataframe by median speed.

Uploaded by

ASHUTOSH TRIVEDI
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
575 views

Python - DataScience Question - Paper

The document contains questions related to pandas data structures and operations. It includes multiple choice questions and a coding exercise on grouping and sorting a dataframe. The coding exercise asks to group a dataframe by system and speed per day, calculate the median speed for each group, add a column with the name 'Median', and sort the dataframe by median speed.

Uploaded by

ASHUTOSH TRIVEDI
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Interview Question

1. Mention the different types of Data Structures in Pandas?


2. Define Series in Pandas?
3. Define DataFrame in Pandas?
4. What are the significant features of the pandas Library?
5. Define the different ways a DataFrame can be created in pandas?
6. How will you create an empty DataFrame in Pandas?
7. Define GroupBy in Pandas?
8. What are the various features of NumPy?
9. List the advantages NumPy Arrays have over (nested) Python lists?
10. How to create a NumPy array from a Python list?

MCQ
Q1 of 10

Which of the following functions can be used to read the dataset from a comma separated values file?

a) read_csv
b) read_xlsx
c) read_json
d) read_excel

Q2 of 10

Which of the following can be utilized to create a dataframe in pandas ?

a) List of dictionaries
b) A python dictionary
c) Python series
d) All of the given options

Q3 of 10

What will be the output of the following code?

import pandas as pd
df = pd.DataFrame({'Employee_Id':['A','B','C','D','E'],'Count':[100,200,300,400,250]})
df.rename(columns = {‘Count’:’Employee_Count’})
print(df.columns)

a) [‘Employee_Id’, ‘Employee_Count’]
b) [‘Employee_Id’, ‘Count’]
c) Error
d) None of the above
Q4 of 10

What will be the output of the below code snippet ?


import pandas as pd
ser = pd.Series([10,20,30,40,50], index = [1,2,3,4,5])
print(ser.iloc[1])

a) 10
b) 20
c) Runtime error
d) 1

Q5 of 10

What will be the output of the below code snippet ?


import pandas as pd
ser = pd.Series([10,20,30,40,50], index = [1,2,3,4,5])
print(ser.loc[1])

a) 10
b) 20
c) Runtime error
d) 1

Q6 of 10

What will be the output of the below code snippet ?


rng = np.array(list(range(10,20)))
ser = pd.Series(rng)
ser+=10
print(ser.iloc[7])

a) 16
b) 26
c) 27
d) 10

Q7 of 10

Fill in the blank to get the given output:


df = pd.DataFrame({'Chemistry': [67,90,66,32],
'Physics': [45,92,72,40],
'Mathematics': [50,87,81,12],
'English': [19,90,72,68]})

_________________________

df

a. df += 10
b. df + [10,10,10,10]
c. df += [10, 20, 10, 10]
d. df + [10, 20, 10, 10]

Q8 of 10

Which of the following will be used to sort the data according to marks in ‘Physics’ ?

df = pd.DataFrame({'Chemistry': [67,90,66,32],
'Physics': [45,92,72,40],
'Mathematics': [50,87,81,12],
'English': [19,90,72,68]})

a. df.groupby('Physics')
b. df.sort_values(by = 'Physics')
c. df.sort(by = 'Physics')
d. df.sortby('Physics')

Q9 of 10

State True or False:

In Pandas, the merge keyword automatically performs the inner join.

a. True
b. False

Q10 of 10

Given below is a DataFrame df, choose the right option when the given snippet executes:

import pandas as pd
df = pd.DataFrame([[54.2,'a'],[658,'d']],
index = list('pq'))
df.columns = df.index
print(df.columns.values)

a. [0, 1]
b. ['p', 'q']
c. RangeIndex(start=0, stop=2, step=1)
d. Index(['p', 'q'], dtype='object')

Coding

Exercise
Problem Statement:
Given a dataframe df which has three attributes defining: set_name: system names, spd_per_day: Speed per
day, speed: Network speed in MBps

1. sys = ['s1','s1','s1','s1',

2. 's2','s2','s2','s2']
3. net_day = ['d1','d1','d2','d2',
4. 'd1','d1','d2','d2']
5. spd = [1.3, 11.4, 5.6, 12.3,
6. 6.2, 1.1, 20.0, 8.8]
7. df = pd.DataFrame({'set_name':sys,
8. 'spd_per_day':net_day,
9. 'speed':spd})

10.

Do the following:

1. Construct a dataframe new_df where the given dataset is grouped based on each system (s1 and s2)
and speed per day (d1 and d2) with the median speed each day per system. Also, provide a secondary
name ' Median' for the speed attribute.
2. Sort the dataframe new_df in the ascending order of the median speed.
Problem Statement:
To find the rainfall dataset that is to be considered for the exercise.. This data contains
region(district) wise rainfall across India.

1. Import the data into Python environment as a Pandas DataFrame.


2. Check for missing values, if any and drop the corresponding rows.
3. Find the district that gets the highest annual rainfall.
4. Display the top 5 states that get the highest annual rainfall.
5. Drop the columns 'Jan-Feb', 'Mar-May', 'Jun-Sep', 'Oct-Dec'.
6. Display the state-wise mean rainfall for all the months using a pivot table.
7. Display the count of districts in each state.
8. For each state, display the district that gets the highest rainfall in May. Also display the
recorded rainfall.

You might also like