0% found this document useful (0 votes)
13 views

Python MCQs

The document contains multiple-choice questions (MCQs) focused on Python data analysis using Pandas, covering topics such as data loading, merging, handling missing values, removing duplicates, data analysis, visualization, and advanced scenarios. Each question presents a practical scenario with options, and the correct answer is provided for each. The document serves as a study guide for individuals looking to enhance their skills in data analysis with Python and Pandas.

Uploaded by

Hʌɱʑʌ Awʌŋ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Python MCQs

The document contains multiple-choice questions (MCQs) focused on Python data analysis using Pandas, covering topics such as data loading, merging, handling missing values, removing duplicates, data analysis, visualization, and advanced scenarios. Each question presents a practical scenario with options, and the correct answer is provided for each. The document serves as a study guide for individuals looking to enhance their skills in data analysis with Python and Pandas.

Uploaded by

Hʌɱʑʌ Awʌŋ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Al-Beruni City: Python Data Analysis MCQs (Practical Scenarios)

Section 1: Data Loading and Initial Inspection

1. Scenario: You need to load Departments.csv into a DataFrame df1 and Tools.csv into df2. Which library
must be imported first?

o a) import matplotlib.pyplot as plt

o b) import numpy as np

o c) import pandas as pd

o d) import csv Answer: c) import pandas as pd

2. Scenario: After loading df1 and df2 using pd.read_csv(), you want to verify the first row of df1. Which
command achieves this?

o a) print(df1.first())

o b) print(df1.loc[1])

o c) print(df1.head(1))

o d) print(df1.iloc[1]) Answer: c) print(df1.head(1))

3. Scenario: Imagine Tools.csv has 10 columns and 500 rows (excluding the header). After loading it into df2,
what would df2.shape return?

o a) (10, 500)

o b) (500, 10)

o c) (501, 10)

o d) (500, 11) Answer: b) (500, 10)

4. Scenario: If Departments.csv failed to load correctly due to an incorrect file path, what type of error would
Python typically raise?

o a) TypeError

o b) ValueError

o c) FileNotFoundError

o d) KeyError Answer: c) FileNotFoundError

Section 2: Merging and Missing Values

5. Scenario: You need to merge df1 (Departments) and df2 (Tools) keeping all rows from df2. Which
pd.merge parameter is crucial for this?

o a) how='inner'

o b) how='left', left_on='Abb', right_on='Abb' (assuming df1 is left)

o c) how='outer', on='Abb'

o d) how='right', left_on='Abb', right_on='Abb' (assuming df1 is left) Answer: d) how='right',


left_on='Abb', right_on='Abb' (Keeps all rows from the right DataFrame, df2)
6. Scenario: After merging, you run merged_df.isnull().sum(). The output shows Department 50. What does
this indicate?

o a) The 'Department' column has 50 unique values.

o b) 50 rows have missing values across all columns.

o c) The 'Department' column contains the string "50" in some rows.

o d) There are 50 missing (NaN) values specifically in the 'Department' column. Answer: d) There
are 50 missing (NaN) values specifically in the 'Department' column.

7. Scenario: You need to create a Python dictionary dept_map where keys are 'Abb' and values are
'Department' from df1 to help fill missing values. Assuming 'Abb' is unique in df1, which code snippet
works?

o a) dept_map = df1.groupby('Abb')['Department'].to_dict()

o b) dept_map = dict(df1[['Abb', 'Department']].values)

o c) dept_map = df1.set_index('Abb')['Department'].to_dict()

o d) dept_map = {row['Abb']: row['Department'] for index, row in df1.iterrows()} Answer: c)


dept_map = df1.set_index('Abb')['Department'].to_dict() (Option 'd' also works but is generally less
efficient than vectorized methods like 'c'). Option 'b' requires specific structuring. 'a' is for group
objects.

8. Scenario: To fill missing 'Department' values in merged_df using the dept_map created previously, which is
the most appropriate Pandas method?

o a) merged_df['Department'].fillna(merged_df['Abb'].apply(lambda x: dept_map.get(x)),
inplace=True)

o b) merged_df['Department'] = merged_df['Department'].replace(np.nan,
merged_df['Abb'].map(dept_map))

o c) merged_df['Department'].fillna(merged_df['Abb'].map(dept_map), inplace=True)

o d) merged_df.loc[merged_df['Department'].isnull(), 'Department'] =
merged_df['Abb'].map(dept_map) Answer: c)
merged_df['Department'].fillna(merged_df['Abb'].map(dept_map), inplace=True) (Option 'd' also
works, 'a' uses less direct apply, 'b' uses replace which is less standard for NaN filling with a map).

9. Scenario: After filling missing values, you want to confirm there are no NaNs left in the entire DataFrame
merged_df. Which command returns True if there are absolutely no missing values?

o a) merged_df.isnull().sum().sum() == 0

o b) merged_df.notnull().all().all()

o c) merged_df.isna().any().any() == False

o d) All of the above Answer: d) All of the above

10. Scenario: You need to save the cleaned merged DataFrame to a CSV file, excluding the DataFrame index.
Which parameter in to_csv() achieves this?

o a) index=False

o b) header=False
o c) save_index=False

o d) no_index=True Answer: a) index=False

Section 3: Removing Duplicates

11. Scenario: You need to remove duplicate rows based only on the combination of 'Abb' and 'Tool' columns in
merged_df. Which command is correct?

o a) unique_df = merged_df.drop_duplicates()

o b) unique_df = merged_df.drop_duplicates(subset=['Abb', 'Tool'])

o c) unique_df = merged_df.remove_duplicates(on=['Abb', 'Tool'])

o d) unique_df = merged_df[~merged_df.duplicated(subset=['Abb', 'Tool'])] Answer: b) unique_df =


merged_df.drop_duplicates(subset=['Abb', 'Tool']) (Option 'd' also works but 'b' is more direct).

12. Scenario: merged_df has shape (1000, 12). After running unique_df =
merged_df.drop_duplicates(subset=['Abb', 'Tool']), unique_df.shape is (950, 12). How many rows were
identified as duplicates and removed?

o a) 950

o b) 12

o c) 1000

o d) 50 Answer: d) 50 (1000 - 950)

13. Scenario: When removing duplicates using drop_duplicates(subset=['Abb', 'Tool']), which duplicate row is
kept by default?

o a) The last occurring row.

o b) The first occurring row.

o c) A randomly selected row.

o d) No rows are kept if duplicates exist. Answer: b) The first occurring row (controlled by the keep
parameter, which defaults to 'first').

14. Scenario: You save the unique_df to a CSV named '12345-Unique.csv', where 12345 is your CRN. Which
pandas function call achieves this?

o a) unique_df.save_csv('12345-Unique.csv', index=False)

o b) unique_df.to_excel('12345-Unique.csv', index=False)

o c) unique_df.to_csv('12345-Unique.csv', index=False)

o d) pd.write_csv(unique_df, '12345-Unique.csv', index=False) Answer: c) unique_df.to_csv('12345-


Unique.csv', index=False)

Section 4: Data Analysis

15. Scenario: You need to find the department with the highest variety (count of unique values) of 'Analysis'
types using the unique_df. Which code snippet finds the count of unique analysis types per department?

o a) unique_df.groupby('Department')['Analysis'].count()
o b) unique_df.groupby('Department')['Analysis'].value_counts()

o c) unique_df.groupby('Department')['Analysis'].nunique()

o d) unique_df['Department'].nunique() Answer: c)
unique_df.groupby('Department')['Analysis'].nunique()

16. Scenario: Following the previous question, how do you get the name of the department with the maximum
unique count? Let the result of the previous step be stored in a Series analysis_variety.

o a) analysis_variety.max()

o b) analysis_variety.idxmax()

o c) analysis_variety.sort_values(ascending=False).index[0]

o d) Both b and c Answer: d) Both b and c

17. Scenario: Task (d)(ii) asks for the "percentage of updating of each tool". Assuming the 'Updated' column
contains Boolean values (True/False) or 1/0, how could you calculate the percentage of entries for each
unique tool name that are marked as 'Updated' (True/1)?

o a) unique_df.groupby('Tool')['Updated'].mean() * 100

o b) unique_df['Updated'].value_counts(normalize=True) * 100

o c) unique_df.groupby('Tool')['Updated'].sum() / unique_df.groupby('Tool')['Updated'].count() *
100

o d) Both a and c Answer: d) Both a and c (mean() on boolean/1-0 data calculates the proportion of
True/1s).

18. Scenario: If a specific tool 'ToolX' appears 10 times in unique_df, and 3 of these entries have Updated ==
True, what would unique_df.groupby('Tool')['Updated'].mean().loc['ToolX'] return?

o a) 3

o b) 0.3

o c) 30

o d) 7 Answer: b) 0.3 (The mean of [True, True, True, False, False, False, False, False, False, False]
treated as [1, 1, 1, 0, 0, 0, 0, 0, 0, 0] is 3/10 = 0.3).

Section 5: Data Visualization

19. Scenario: To create a bar chart showing the count of tools per department, you first need to calculate these
counts. Which code prepares the data counts_per_dept for plotting?

o a) counts_per_dept = unique_df.groupby('Department')['Tool'].nunique()

o b) counts_per_dept = unique_df['Department'].value_counts()

o c) counts_per_dept = unique_df.groupby('Department').size()

o d) Both b and c Answer: d) Both b and c (value_counts() on the Department column or grouping by
Department and using size() or count() will give the number of rows/tool entries per department).

20. Scenario: You have the counts_per_dept Series. Which command using pandas plotting interface generates
the required vertical bar chart?
o a) counts_per_dept.plot(kind='pie')

o b) counts_per_dept.plot(kind='bar')

o c) counts_per_dept.plot(kind='line')

o d) counts_per_dept.plot.barh() Answer: b) counts_per_dept.plot(kind='bar')

21. Scenario: For the pie chart of the 'Analysis' column distribution, what data does the size of each slice
represent?

o a) The number of departments using that analysis type.

o b) The average usage date of that analysis type.

o c) The relative frequency (percentage) of each analysis type in the dataset.

o d) The number of tools performing that analysis type. Answer: c) The relative frequency
(percentage) of each analysis type in the dataset.

22. Scenario: Which code generates the data needed for the 'Analysis' pie chart?

o a) analysis_counts = unique_df.groupby('Analysis').size()

o b) analysis_counts = unique_df['Analysis'].value_counts()

o c) analysis_counts = unique_df['Analysis'].unique()

o d) Both a and b Answer: d) Both a and b

23. Scenario: Task (e)(iii) requires a bar plot showing the number of tools marked as "Updated". If you
interpret this as comparing the total count of 'Updated' entries vs 'Not Updated' entries, what data source
(Series) would you plot?

o a) unique_df['Updated'].value_counts()

o b) unique_df.groupby('Updated').size()

o c) unique_df.groupby('Tool')['Updated'].sum()

o d) Both a and b Answer: d) Both a and b

24. Scenario: Given the data updated_counts = unique_df['Updated'].value_counts(), which command


generates the bar plot comparing the counts of True/False (or Yes/No) values?

o a) updated_counts.plot(kind='pie')

o b) updated_counts.plot(kind='barh')

o c) updated_counts.plot(kind='line')

o d) updated_counts.plot(kind='bar') Answer: d) updated_counts.plot(kind='bar')

Section 6: Python/Pandas Concepts & Advanced Scenarios

25. Scenario: If the 'Date' column was loaded as strings instead of datetime objects, which Pandas function is
used to convert it correctly?

o a) pd.to_datetime(unique_df['Date'])

o b) unique_df['Date'].astype('datetime64[ns]')
o c) pd.convert_dtypes(unique_df['Date'])

o d) Both a and b Answer: d) Both a and b

26. Scenario: You want to calculate the standard deviation of the number of tools used per department. Which
sequence of operations is needed?

o a) Calculate value_counts() on 'Department', then apply .std().

o b) Group by 'Department', count 'Tool' (size()), then apply .std() to the resulting Series.

o c) Calculate standard deviation directly on the 'Tool' column.

o d) Use unique_df.describe() and find the 'std' row for 'Department'. Answer: b) Group by
'Department', count 'Tool' (size()), then apply .std() to the resulting Series.

27. Scenario: Suppose you want to find if there's a correlation between the number of tools a department uses
and the variety of analysis types it employs. Which correlation method in Pandas would be suitable after
calculating these two series (tools_count and analysis_variety)?

o a) tools_count.corr(analysis_variety)

o b) pd.DataFrame({'tools': tools_count, 'variety': analysis_variety}).corr()

o c) np.correlate(tools_count, analysis_variety)

o d) Both a and b provide the correlation coefficient between the two measures. Answer: d) Both a
and b provide the correlation coefficient between the two measures.

28. Scenario: Which NumPy function could be used to efficiently check if any value in the 'Updated' column
(once converted to boolean) is True?

o a) np.sum(unique_df['Updated'])

o b) np.any(unique_df['Updated'])

o c) np.all(unique_df['Updated'])

o d) np.mean(unique_df['Updated']) Answer: b) np.any(unique_df['Updated'])

29. Scenario: Imagine you want to select all rows from unique_df where the 'Tool desc' column contains the
word "AI". Which Pandas string method is appropriate?

o a) unique_df[unique_df['Tool desc'].contains('AI')]

o b) unique_df[unique_df['Tool desc'].str.contains('AI')]

o c) unique_df[unique_df['Tool desc'].find('AI') != -1]

o d) unique_df[unique_df['Tool desc'].match('AI')] Answer: b) unique_df[unique_df['Tool


desc'].str.contains('AI')]

30. Scenario: To calculate the median 'first used' Date for tools within each 'Analysis' type, you would group
by 'Analysis' and then apply which aggregation function to the 'Date' column (assuming it's datetime)?

o a) .mean()

o b) .count()

o c) .median()
o d) .mode() Answer: c) .median()

31. Scenario: If you create a new column 'Years Since First Use' based on the 'Date' column (datetime) and the
current date (pd.Timestamp.now()), which expression calculates this approximately?

o a) (pd.Timestamp.now() - unique_df['Date']).dt.years

o b) (pd.Timestamp.now().year - unique_df['Date'].dt.year)

o c) (pd.Timestamp.now() - unique_df['Date']) / np.timedelta64(1, 'Y')

o d) Both b and c provide valid ways to estimate years (b is simpler integer difference, c is more
precise float). Answer: d) Both b and c provide valid ways to estimate years (b is simpler integer
difference, c is more precise float).

32. Scenario: Applying a function row-by-row using .apply(..., axis=1) is generally:

o a) More efficient than vectorized operations in Pandas/NumPy.

o b) Less efficient than vectorized operations in Pandas/NumPy.

o c) The only way to perform complex row-wise calculations.

o d) Primarily used for merging DataFrames. Answer: b) Less efficient than vectorized operations in
Pandas/NumPy.

33. Scenario: If df1 had 10 departments and df2 had tools used by only 8 of these departments, what would be
the result of an inner merge on 'Abb'?

o a) Rows corresponding to all 10 departments.

o b) Rows corresponding to only the 8 departments present in df2.

o c) Rows corresponding to the 2 departments only present in df1.

o d) An error because not all keys match. Answer: b) Rows corresponding to only the 8 departments
present in df2.

34. Scenario: Which Python data structure is returned by df1.set_index('Abb')['Department'] before calling
.to_dict()?

o a) A NumPy array

o b) A Python list

o c) A Pandas DataFrame

o d) A Pandas Series Answer: d) A Pandas Series

35. Scenario: To find tools used only by the 'Education' department, which approach is most direct using
pandas filtering and grouping?

o a) unique_df.groupby('Tool').filter(lambda x: (x['Department'] == 'Education').all() and


len(x['Department'].unique()) == 1)

o b) tool_counts = unique_df.groupby('Tool')['Department'].nunique(); single_dept_tools =


tool_counts[tool_counts == 1].index; unique_df[(unique_df['Tool'].isin(single_dept_tools)) &
(unique_df['Department'] == 'Education')]

o c) unique_df[unique_df['Department'] == 'Education']['Tool'].unique() (This gets tools used by


Education, not only Education)
o d) Both a and b achieve the goal (b is often more readable). Answer: d) Both a and b achieve the
goal (b is often more readable).

36. Scenario: If the 'Updated' column contained strings like 'Yes', 'No', 'YES', 'no', what's the first step using
pandas string methods before mapping to Boolean?

o a) unique_df['Updated'].str.upper()

o b) unique_df['Updated'].str.lower()

o c) unique_df['Updated'].str.capitalize()

o d) unique_df['Updated'].str.strip() Answer: b) unique_df['Updated'].str.lower() (Or upper, to


standardize case).

37. Scenario: Displaying the .shape before and after removing duplicates directly quantifies:

o a) The number of missing values handled.

o b) The number of rows removed due to duplication based on the specified columns.

o c) The number of columns remaining after cleaning.

o d) The change in memory usage of the DataFrame. Answer: b) The number of rows removed due to
duplication based on the specified columns.

38. Scenario: Which library is most commonly used alongside Pandas for numerical operations and underpins
many Pandas functionalities?

o a) Matplotlib

o b) SciPy

o c) NumPy

o d) Scikit-learn Answer: c) NumPy

39. Scenario: If you were asked to build a function that takes a department name as input and returns a list of
unique tools used by that department from unique_df, which code structure would be appropriate?

o a) def get_tools(dept_name): return unique_df[unique_df['Department'] ==


dept_name]['Tool'].tolist()

o b) def get_tools(dept_name): return unique_df[unique_df['Department'] ==


dept_name]['Tool'].unique().tolist()

o c) def get_tools(dept_name): return


unique_df.groupby('Department')['Tool'].unique().loc[dept_name].tolist()

o d) Both b and c Answer: d) Both b and c

40. Scenario: To quickly get summary statistics (count, mean, std, min, max, quartiles) for numerical columns
potentially present in unique_df (if any existed), which Pandas method is used?

o a) .info()

o b) .describe()

o c) .head()

o d) .corr() Answer: b) .describe()


41. Scenario: You want to add a column IsHealthDept which is True if Department is 'Health' and False
otherwise. Which is a correct way?

o a) unique_df['IsHealthDept'] = unique_df['Department'] == 'Health'

o b) unique_df['IsHealthDept'] = unique_df['Department'].apply(lambda x: True if x == 'Health' else


False)

o c) unique_df['IsHealthDept'] = np.where(unique_df['Department'] == 'Health', True, False)

o d) All of the above Answer: d) All of the above

42. Scenario: You want to filter unique_df to show only the tools used by the 'Health' department OR the
'Education' department. Which code works?

o a) unique_df[(unique_df['Department'] == 'Health') & (unique_df['Department'] == 'Education')]

o b) unique_df[unique_df['Department'].isin(['Health', 'Education'])]

o c) unique_df.query("Department == 'Health' | Department == 'Education'")

o d) Both b and c Answer: d) Both b and c

43. Scenario: After finding the department with the highest variety of 'Analysis' types (let's say its name is
stored in dept_max_variety), how would you filter unique_df to show only the rows corresponding to this
specific department?

o a) unique_df[unique_df['Department'] == dept_max_variety]

o b) unique_df.loc[dept_max_variety] (Incorrect indexing for this)

o c) unique_df.filter(like=dept_max_variety, axis=0) (Incorrect use of filter)

o d) unique_df.groupby('Department').get_group(dept_max_variety)

o e) Both a and d Answer: e) Both a and d

44. Scenario: Imagine the 'Date' column is already converted to datetime objects. How would you find the
number of tools first used specifically in the year 2020?

o a) unique_df[unique_df['Date'].dt.year == 2020].shape[0]

o b) unique_df['Date'].dt.year.value_counts().loc[2020] (Might raise KeyError if no tools from 2020)

o c) sum(unique_df['Date'].dt.year == 2020)

o d) All of the above (with a note about potential KeyError for b) Answer: d) All of the above (with a
note about potential KeyError for b) - a and c are generally safer.

45. Scenario: You want to count how many unique tools have a description ('Tool desc') longer than 50
characters. Which is the correct approach?

o a) unique_df[unique_df['Tool desc'].str.len() > 50]['Tool'].count() (Counts rows, not unique tools)

o b) unique_df[unique_df['Tool desc'].str.len() > 50]['Tool'].nunique()

o c) sum(unique_df['Tool desc'].str.len() > 50) (Counts rows)

o d) len(unique_df[unique_df['Tool desc'].str.len() > 50]) (Counts rows) Answer: b)


unique_df[unique_df['Tool desc'].str.len() > 50]['Tool'].nunique()
46. Scenario: How would you calculate the total number of tool entries that are both for the 'Public Safety'
department and have the 'Analysis' type 'Predictive'?

o a) unique_df.query("Department == 'Public Safety' and Analysis == 'Predictive'").shape[0]

o b) len(unique_df[(unique_df['Department'] == 'Public Safety') & (unique_df['Analysis'] ==


'Predictive')])

o c) sum((unique_df['Department'] == 'Public Safety') & (unique_df['Analysis'] == 'Predictive'))

o d) All of the above Answer: d) All of the above

47. Scenario: You need to create a Series showing the most frequent 'Analysis' type used by each department.
Which combination of Pandas methods is most suitable?

o a) unique_df.groupby('Department')['Analysis'].mode().reset_index(level=1, drop=True) (Mode can


return multiple values if tied, needs handling)

o b) unique_df.groupby('Department')['Analysis'].value_counts().idxmax() (This finds the overall max


combo, not per dept)

o c) unique_df.groupby('Department')['Analysis'].describe()['top']

o d) Both a (with handling for ties) and c provide a way to get the most frequent type per department.
Answer: d) Both a (with handling for ties) and c provide a way to get the most frequent type per
department. (c is often simpler if only one mode is needed).

48. Scenario: What pandas command would select all columns except for 'Tool desc' and 'Output' from
unique_df?

o a) unique_df.drop(columns=['Tool desc', 'Output'])

o b) unique_df.select(lambda col: col not in ['Tool desc', 'Output'], axis=1) (Select is not standard)

o c) unique_df.loc[:, ~unique_df.columns.isin(['Tool desc', 'Output'])]

o d) Both a and c Answer: d) Both a and c

49. Scenario: If you wanted to see if any 'Tool' name appears within its own 'Tool desc' column (e.g., tool
'Analyzer' is mentioned in its description), how might you check this for the first 10 rows? (Requires
combining columns row-wise)
o a) unique_df.head(10).apply(lambda row: row['Tool'] in row['Tool desc'], axis=1)
o b) unique_df['Tool'].head(10).isin(unique_df['Tool desc'].head(10)) (Checks if Tool name equals
description)
o c) [unique_df['Tool'][i] in unique_df['Tool desc'][i] for i in range(10)] (Assuming default integer
index)
o d) Both a and c (a is more robust to index changes) Answer: d) Both a and c (a is more robust to
index changes)

50. Scenario: You want to replace the 'Analysis' type 'Descriptive' with 'Summary' and 'Predictive' with
'Forecast' only in the 'Analysis' column of unique_df. Which command works best?
o a) unique_df['Analysis'].replace({'Descriptive': 'Summary', 'Predictive': 'Forecast'}, inplace=True)
o b) unique_df['Analysis'].map({'Descriptive': 'Summary', 'Predictive': 'Forecast'}) (Map replaces non-
matching with NaN unless default is set)
o c) unique_df.replace({'Analysis': {'Descriptive': 'Summary', 'Predictive': 'Forecast'}}, inplace=True)
o d) Both a and c Answer: d) Both a and c (a targets the specific column, c targets the value within the
specified column dictionary).
Python Practical MCQs — Al-Beruni City Case Study Context

Data Loading & Library Import MCQs

MCQ 1

Which command is correct to import Pandas library for data manipulation?

A) import panda as pd
B) import pandas as pd
C) import pandas.dataframe
D) from pandas import *

Answer: B

MCQ 2

If the file Departments.csv is located in the working directory, which code will load it into DataFrame df1?

A) df1 = pd.readfile('Departments.csv')
B) df1 = pd.read_csv('Departments.csv')
C) df1 = pd.load_csv('Departments.csv')
D) df1 = pd.readfile_csv('Departments.csv')

Answer: B

MCQ 3

To load Tools.csv file in df2 and view the first row only:

A) df2.head()
B) df2.loc[0]
C) df2.iloc[0]
D) df2.head(1)

Answer: D

MCQ 4

Which Python library is essential for numerical analysis and array operations in this assignment?

A) Matplotlib
B) Pandas
C) NumPy
D) Seaborn

Answer: C

MCQ 5

Which of the following is the correct command to display column names of df1?
A) df1.column_names()
B) df1.columns
C) df1.column()
D) df1.col_names

Answer: B

Data Cleaning & Merging MCQs

MCQ 6

To merge df1 and df2 such that all rows of df2 must appear in merged data:

A) pd.merge(df1, df2, how="inner")


B) pd.merge(df1, df2, how="outer")
C) pd.merge(df1, df2, how="right")
D) pd.merge(df1, df2, how="left")

Answer: C

MCQ 7

To check total missing values in each column of a dataframe:

A) df.isnull().sum()
B) df.isnull.count()
C) df.checknull()
D) df.isna()

Answer: A

MCQ 8

If 'Department' column has missing values and we have a dictionary mapping, which method should be used to fill
it?

A) fillna()
B) map().fillna()
C) replace()
D) dropna()

Answer: B

MCQ 9

After filling missing values, which command ensures there are no missing values?

A) df.isna().sum()
B) df.isnull().sum() == 0
C) df.dropna()
D) df.notnull().sum()
Answer: B

MCQ 10

To save updated dataframe to a new csv file without index:

A) df.to_csv('filename.csv')
B) df.to_csv('filename.csv', index=True)
C) df.save_csv('filename.csv')
D) df.to_csv('filename.csv', index=False)

Answer: D

Duplicate Handling MCQs

MCQ 11

Command to remove duplicates based on 'Abb' and 'Tool Name' fields:

A) df.drop_duplicates(['Abb', 'Tool Name'], inplace=True)


B) df.drop_duplicates()
C) df.unique()
D) df.dropna()

Answer: A

MCQ 12

To check the shape of dataframe:

A) df.shape()
B) df.size()
C) df.shape
D) df.count()

Answer: C

MCQ 13

To save the dataframe after removing duplicates as per assignment requirement:

A) df.to_csv('CRN-Unique.csv', index=False)
B) df.save('CRN-Unique.csv')
C) df.save_csv('CRN-Unique.csv')
D) df.to_csv('CRN_Unique.csv')

Answer: A

Data Analysis MCQs

MCQ 14
To find the department with the highest variety of 'Analysis' types:

A) df['Analysis'].value_counts()
B) df.groupby('Department')['Analysis'].nunique().idxmax()
C) df.groupby('Analysis')['Department'].count()
D) df['Department'].nunique()

Answer: B

MCQ 15

To calculate % of tools updated:

A) (df['Updated']=='Yes').sum() / len(df) * 100


B) df['Updated'].mean()
C) df['Updated'].value_counts() / df['Updated'].sum()
D) df['Updated'].count() / df['Updated'].sum()

Answer: A

Data Visualization MCQs

MCQ 16

To plot the count of tools per department using Matplotlib:

A) df['Department'].plot(kind='bar')
B) df['Department'].value_counts().plot(kind='bar')
C) sns.barplot(x='Department', y='Tool Name', data=df)
D) plt.bar(df['Department'], df['Tool Name'])

Answer: B

MCQ 17

To generate a pie chart for the 'Analysis' column:

A) df['Analysis'].value_counts().plot.pie()
B) plt.pie(df['Analysis'])
C) df['Analysis'].plot(kind='pie')
D) sns.pieplot(df['Analysis'])

Answer: A

MCQ 18

To create a bar plot for tools marked as "Updated" using Seaborn:

A) sns.countplot(x='Updated', data=df)
B) sns.barplot(x='Updated', y='Tool Name', data=df)
C) df['Updated'].plot(kind='bar')
D) plt.bar('Updated', 'Tool Name')
Answer: A

MCQ 19

To rotate x-axis labels by 45 degrees for better readability:

A) plt.xticks(rotation=45)
B) plt.xlabels(45)
C) plt.xlabel(rotation=45)
D) plt.rotate(45)

Answer: A

MCQ 20

To add values inside the slices of a pie chart in Matplotlib:

A) autopct='%1.1f%%'
B) data_label='inside'
C) plt.labels(inside=True)
D) df.plot.pie(labels='inside')

Answer: A

MCQ 21

To find total number of tools used per department after removing duplicates:

A) df['Department'].value_counts()
B) df.groupby('Department')['Tool Name'].count()
C) df.groupby('Tool Name')['Department'].count()
D) df['Tool Name'].count()

Answer: B

MCQ 22

To calculate mean of numerical column 'Population':

A) df['Population'].mean()
B) df.mean('Population')
C) np.mean(df['Population'])
D) Both A and C

Answer: D

MCQ 23

To calculate standard deviation of numerical column:


A) df['Population'].std()
B) df.std('Population')
C) np.std(df['Population'])
D) Both A and C

Answer: D

MCQ 24

To check correlation between numerical columns in dataframe:

A) df.corr()
B) df.correlation()
C) df.cov()
D) df.group_corr()

Answer: A

MCQ 25

To apply filter and show records where 'Updated' is 'No':

A) df[df['Updated'] == 'No']
B) df.loc[df['Updated'] == 'No']
C) df.query("Updated == 'No'")
D) All of the above

Answer: D

MCQ 26

In an ETL process, which is part of "Extract" in Python?

A) Reading csv using pandas


B) Using API to fetch data
C) SQL Query execution
D) All of the above

Answer: D

MCQ 27

In "Transform" step of ETL, you perform:

A) Handling Missing Values


B) Data Cleaning
C) Column Renaming
D) All of the above

Answer: D
MCQ 28

To calculate median of numerical column:

A) df['Population'].median()
B) df.median('Population')
C) np.median(df['Population'])
D) Both A and C

Answer: D

MCQ 29

To remove unwanted whitespaces from string columns:

A) df['Column'] = df['Column'].str.strip()
B) df['Column'] = df['Column'].strip()
C) df['Column'] = df.strip('Column')
D) df['Column'].remove_whitespace()

Answer: A

MCQ 30

To replace missing values with 'Unknown':

A) df.fillna('Unknown')
B) df.replace(np.nan, 'Unknown')
C) df['Column'] = df['Column'].fillna('Unknown')
D) All of the above

Answer: D

MCQ 31

Which pandas function returns basic statistics like count, mean, std, min, max?

A) df.stats()
B) df.describe()
C) df.summary()
D) df.explain()

Answer: B

MCQ 32

To reset index after dropping rows:

A) df.reset_index()
B) df.reset_index(drop=True, inplace=True)
C) df.index_reset()
D) df.drop_index()
Answer: B

MCQ 33

Which visualization is best to show distribution of numerical column?

A) Line Chart
B) Histogram
C) Pie Chart
D) Bar Chart

Answer: B

MCQ 34

To export only selected columns to CSV:

A) df[['col1', 'col2']].to_csv('output.csv', index=False)


B) df.select(['col1','col2']).to_csv('output.csv')
C) df[['col1','col2']].save('output.csv')
D) df['col1','col2'].export_csv()

Answer: A

MCQ 35

In Market Basket Analysis using Python, which library is commonly used?

A) pandas
B) numpy
C) mlxtend
D) seaborn

Answer: C

MCQ 36

For predictive maintenance model, which technique is preferred?

A) Regression
B) Clustering
C) Classification
D) Time Series Forecasting

Answer: D

MCQ 37

Which function is used to create new calculated column in dataframe?


A) df.create_column()
B) df['New_Column'] = ...
C) df.new_column()
D) df.add_column()

Answer: B

MCQ 38

Which method provides highest-level summary of dataframe?

A) df.info()
B) df.summary()
C) df.describe()
D) df.columns

Answer: A

MCQ 39

In scatter plot to show correlation between two numerical columns:

A) sns.scatterplot(x='col1', y='col2', data=df)


B) plt.scatter(df['col1'], df['col2'])
C) Both A and B
D) df.plot.scatter('col1','col2')

Answer: C

MCQ 40

To group data by Department and calculate sum of 'Population':

A) df.groupby('Department')['Population'].sum()
B) df.sum('Population').groupby('Department')
C) df.group('Department').sum('Population')
D) groupby(df['Department'])

Answer: A

MCQ 41

To create a bar chart using Seaborn:

A) sns.barplot(x='Department', y='Population', data=df)


B) sns.histplot(x='Department', y='Population', data=df)
C) sns.countplot(x='Department', data=df)
D) plt.bar(x='Department', height='Population')

Answer: A
MCQ 42

In fraud detection model using Python, which technique is most suitable?

A) Clustering
B) Classification
C) Time Series
D) PCA

Answer: B

MCQ 43

To drop rows where all values are NaN:

A) df.dropna(how='all')
B) df.dropna(all=True)
C) df.drop_allna()
D) df.remove_blank_rows()

Answer: A

MCQ 44

To create pivot table in pandas:

A) df.pivot_table(index='Department', values='Population', aggfunc='sum')


B) df.pivot('Department', 'Population')
C) df.create_pivot('Department')
D) df.group_pivot()

Answer: A

MCQ 45

To calculate variance of numerical column:

A) df['Population'].var()
B) np.var(df['Population'])
C) df.var('Population')
D) Both A and B

Answer: D

MCQ 46

To filter dataframe where Population > 1000:

A) df[df['Population'] > 1000]


B) df.loc[df['Population'] > 1000]
C) df.query("Population > 1000")
D) All of the above
Answer: D

MCQ 47

To show last 5 rows of dataframe:

A) df.tail()
B) df.head()
C) df[-5:]
D) Both A and C

Answer: D

MCQ 48

For supply chain optimization model using Python, which technique is used?

A) Linear Programming
B) Clustering
C) Regression
D) Market Basket

Answer: A

MCQ 49

To check datatype of all columns:

A) df.dtypes
B) df.types()
C) df.datatypes()
D) df.columns.dtypes

Answer: A

MCQ 50

To convert a column to datetime format:

A) pd.to_datetime(df['Date'])
B) df['Date'].astype('datetime')
C) df['Date'].convert('datetime')
D) Both A and B

Answer: A

You might also like