Mastering Data Analyst Interview Scenarios
Mastering Data Analyst Interview Scenarios
Interview Scenarios
Interviewer: How would you create a new column
based on the year of a datetime column? If it's this
year, assign 0; if it's last year, assign 1.
Candidate:
import datetime
current_year = datetime.datetime.now().year
df['YearFlag'] = df['Date'].apply(lambda x: 0 if x.year ==
current_year else 1)
Interviewer: How would you handle the situation
where you need to convert a string column
representing dates in yyyy-mm-dd format into
actual datetime objects?
Candidate:
df['Date'] = pd.to_datetime(
df['Date'], format='%Y-%m-%d')
Interviewer: How can you create a new column in a
DataFrame based on a condition from another
column? For example, if the value in column 'A' is
greater than 10, assign 'High' to a new column,
otherwise assign 'Low'.
Candidate:
df['NewColumn'] =
df['A'].apply(lambda x: 'High' if x > 10 else 'Low')
Interviewer: How can you merge two DataFrames
on multiple columns with different names, while
keeping all the data from both DataFrames?
Candidate:
We can use the merge() function
with the left_on and right_on
parameters to specify different
column names for merging.
Candidate:
df.dropna(subset=['column1', 'column2'],
inplace=True)
Interviewer: How can you create a pivot table to
find the average of 'Sales' grouped by 'Region' and
'Month', and also include a column that counts the
number of transactions?
Candidate:
pivot = df.pivot_table(values=['Sales'],
index=['Region', 'Month'],
aggfunc={'Sales': 'mean', 'TransactionID':'count'})
Interviewer: How would you apply a function that
uses multiple columns in a DataFrame and returns
a new column, say, summing two columns, 'A' and
'B'?
Candidate:
Candidate:
Candidate:
You can use the groupby() method
and then apply the custom
aggregation function using agg()
df.groupby('Category')['Amount'].agg(lambda x:
x.max() - x.min())
Interviewer: If you want to calculate the rolling
mean of a column with a window size of 7, how
would you do it?
Candidate:
df['RollingMean'] =
df['Sales'].rolling(window=7).mean()
Interviewer: How would you handle duplicates in a
DataFrame and keep only the first occurrence of
each duplicate row?
Candidate:
df.drop_duplicates(keep='first', inplace=True)
Interviewer: How can you handle a situation where
you have a multi-index DataFrame and you need to
reset it back to a flat DataFrame?
Candidate:
df_reset = df.reset_index()
Interviewer: How can you get the top N rows for
each group in a DataFrame?
Candidate:
df.groupby('Category').head(3)
Interviewer: How do you perform a cross join
between two DataFrames?
Candidate:
df1['key'] = 1
df2['key'] = 1
df_cross = pd.merge(df1, df2, on='key').drop('key', axis=1)
Interviewer: How would you count the number of
occurrences of each value in a column?
Candidate:
df['Category'].value_counts()
Interviewer: How can you filter rows where a string
column contains a specific substring?
Candidate:
df_filtered =
df[df['ProductName'].str.contains('Laptop')]
Interviewer: How would you handle missing values
in a column and replace them with the mean of
that column?
Candidate:
df['Column'] =
df['Column'].fillna(df['Column'].mean())
FOR CAREER GUIDANCE,
CHECK OUT OUR PAGE
www.nityacloudtech.com
Follow Us on Linkedin:
Aditya Chandak