0% found this document useful (0 votes)

8 views20 pages

Mastering Data Analyst Interview Scenarios

The document outlines a series of interview scenarios for a Data Analyst position, where an interviewer poses various technical questions. The candidate provides concise solutions using Python and pandas functions for data manipulation tasks such as creating new columns, merging DataFrames, handling missing values, and filtering data. Each response includes example code snippets demonstrating the candidate's proficiency in data analysis techniques.

Uploaded by

Deeksha Shetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views20 pages

Mastering Data Analyst Interview Scenarios

Uploaded by

Deeksha Shetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Data Analyst

Interview Scenarios
Interviewer: How would you create a new column
based on the year of a datetime column? If it's this
year, assign 0; if it's last year, assign 1.

Candidate:

Candidate: You can use apply() to

extract the year and compare it to
the current year.

import datetime
current_year = datetime.datetime.now().year
df['YearFlag'] = df['Date'].apply(lambda x: 0 if x.year ==
current_year else 1)
Interviewer: How would you handle the situation
where you need to convert a string column
representing dates in yyyy-mm-dd format into
actual datetime objects?

Candidate:

You can use pd.to_datetime() to

convert the string column into
datetime objects.

df['Date'] = pd.to_datetime(
df['Date'], format='%Y-%m-%d')
Interviewer: How can you create a new column in a
DataFrame based on a condition from another
column? For example, if the value in column 'A' is
greater than 10, assign 'High' to a new column,
otherwise assign 'Low'.

Candidate:

You can use the apply function or

np.where for this.

df['NewColumn'] =
df['A'].apply(lambda x: 'High' if x > 10 else 'Low')
Interviewer: How can you merge two DataFrames
on multiple columns with different names, while
keeping all the data from both DataFrames?

Candidate:
We can use the merge() function
with the left_on and right_on
parameters to specify different
column names for merging.

df1.merge(df2, left_on=['col1', 'col2'],

right_on=['colA', 'colB'], how='outer')
Interviewer: What would you do if you want to drop
rows that have NaN values in a specific subset of
columns, but not in the entire DataFrame?

Candidate:

We can use the dropna() method

with the subset parameter to
specify the columns.

df.dropna(subset=['column1', 'column2'],
inplace=True)
Interviewer: How can you create a pivot table to
find the average of 'Sales' grouped by 'Region' and
'Month', and also include a column that counts the
number of transactions?

Candidate:

You can use the pivot_table()

function with multiple aggregation
functions

pivot = df.pivot_table(values=['Sales'],
index=['Region', 'Month'],
aggfunc={'Sales': 'mean', 'TransactionID':'count'})
Interviewer: How would you apply a function that
uses multiple columns in a DataFrame and returns
a new column, say, summing two columns, 'A' and
'B'?

Candidate:

You can use apply() with axis=1 to

apply a function across rows.

df['SumAB'] = df.apply(lambda row: row['A'] +

row['B'], axis=1)
Interviewer: How would you filter rows in a
DataFrame where a specific column contains
values from a list of options?

Candidate:

You can use the isin() method to

filter rows

df_filtered = df[df['Category'].isin(['A', 'B', 'C'])]

Interviewer: How can you group the DataFrame by
one column and apply a custom aggregation
function to another column?

Candidate:
You can use the groupby() method
and then apply the custom
aggregation function using agg()

df.groupby('Category')['Amount'].agg(lambda x:
x.max() - x.min())
Interviewer: If you want to calculate the rolling
mean of a column with a window size of 7, how
would you do it?

Candidate:

You can use the rolling() function

followed by mean().

df['RollingMean'] =
df['Sales'].rolling(window=7).mean()
Interviewer: How would you handle duplicates in a
DataFrame and keep only the first occurrence of
each duplicate row?

Candidate:

You can use the drop_duplicates()

method and specify keep='first' to
keep the first occurrence.

df.drop_duplicates(keep='first', inplace=True)
Interviewer: How can you handle a situation where
you have a multi-index DataFrame and you need to
reset it back to a flat DataFrame?

Candidate:

You can use reset_index() to

flatten a multi-index DataFrame.

df_reset = df.reset_index()
Interviewer: How can you get the top N rows for
each group in a DataFrame?

Candidate:

You can use groupby() with head()

df.groupby('Category').head(3)
Interviewer: How do you perform a cross join
between two DataFrames?

Candidate:

You can use merge() with an

artificial key (like key=1) to perform
a cross join.

df1['key'] = 1
df2['key'] = 1
df_cross = pd.merge(df1, df2, on='key').drop('key', axis=1)
Interviewer: How would you count the number of
occurrences of each value in a column?

Candidate:

You can use the value_counts()

method.

df['Category'].value_counts()
Interviewer: How can you filter rows where a string
column contains a specific substring?

Candidate:

Candidate: You can use the

str.contains() method for this.

df_filtered =
df[df['ProductName'].str.contains('Laptop')]
Interviewer: How would you handle missing values
in a column and replace them with the mean of
that column?

Candidate:

Candidate: You can use fillna() to

replace missing values

df['Column'] =
df['Column'].fillna(df['Column'].mean())
FOR CAREER GUIDANCE,
CHECK OUT OUR PAGE
www.nityacloudtech.com

Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
Python Cheat Sheet 2.0
100% (1)
Python Cheat Sheet 2.0
10 pages
B "Hello, World!" Print (B (2:5) ) Llo
No ratings yet
B "Hello, World!" Print (B (2:5) ) Llo
52 pages
Salesforce: Notes From Siva
67% (3)
Salesforce: Notes From Siva
87 pages
Performance Task Newtons Olympic
100% (2)
Performance Task Newtons Olympic
1 page
Data Structrue MCQ's
No ratings yet
Data Structrue MCQ's
170 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Dataframe in Pandas - Cheatsheet
No ratings yet
Dataframe in Pandas - Cheatsheet
8 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
CO3 - 3 - Indexing and Sorting, Loading Data From CSV
No ratings yet
CO3 - 3 - Indexing and Sorting, Loading Data From CSV
29 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
Data Cheat Sheet
No ratings yet
Data Cheat Sheet
2 pages
Xii Record (Dataframe & CSV)
No ratings yet
Xii Record (Dataframe & CSV)
11 pages
DATAFRAME
No ratings yet
DATAFRAME
4 pages
Pandas
No ratings yet
Pandas
13 pages
I.P File
No ratings yet
I.P File
20 pages
Create A Pandas Series From A Dictionary of Values and An Ndarray
No ratings yet
Create A Pandas Series From A Dictionary of Values and An Ndarray
15 pages
PYTHON PROGRAMMING: Data Handling
No ratings yet
PYTHON PROGRAMMING: Data Handling
12 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Information Practices
No ratings yet
Information Practices
141 pages
Pandas Moderate
No ratings yet
Pandas Moderate
15 pages
Python Pandas - 2 2020-21
No ratings yet
Python Pandas - 2 2020-21
21 pages
GR12 Record Programs 6TH Onwards
No ratings yet
GR12 Record Programs 6TH Onwards
18 pages
Document (4) - 1
No ratings yet
Document (4) - 1
15 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Pandas Test
No ratings yet
Pandas Test
6 pages
Accent NeutralizationV2.0
100% (1)
Accent NeutralizationV2.0
57 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Python Cheat Sheets
97% (33)
Python Cheat Sheets
11 pages
Pandas Practise Problems
No ratings yet
Pandas Practise Problems
8 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Practice Sheet 01
No ratings yet
Practice Sheet 01
11 pages
Python - Pandas - Numpy Interview Q&A
No ratings yet
Python - Pandas - Numpy Interview Q&A
12 pages
ICT2103 Full Book-Part-3
No ratings yet
ICT2103 Full Book-Part-3
14 pages
MCQ
No ratings yet
MCQ
8 pages
Informatics Practices Practical File
No ratings yet
Informatics Practices Practical File
8 pages
Introduction To Pandas Programming 2
No ratings yet
Introduction To Pandas Programming 2
3 pages
Arm Mbist Controller
No ratings yet
Arm Mbist Controller
64 pages
Pandas Trick Ques
No ratings yet
Pandas Trick Ques
2 pages
Python 2.1.3
No ratings yet
Python 2.1.3
6 pages
Unit 4 Pandas
No ratings yet
Unit 4 Pandas
8 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
(The Ultimate PDF) Practical File For I.P. Practical 2023-24
No ratings yet
(The Ultimate PDF) Practical File For I.P. Practical 2023-24
45 pages
Lesson 2 - UNDERSTANDING SOURCES PDF
No ratings yet
Lesson 2 - UNDERSTANDING SOURCES PDF
2 pages
gettyUKtour2012 Choir Book
No ratings yet
gettyUKtour2012 Choir Book
73 pages
Pandas
No ratings yet
Pandas
94 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
Cheat Sheet
No ratings yet
Cheat Sheet
15 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
5 pages
Template Provided by Genigraphics - 800.790.4001 - Replace This Text With Your Title
No ratings yet
Template Provided by Genigraphics - 800.790.4001 - Replace This Text With Your Title
1 page
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Pandas Dataframe All Operations 1735471870
No ratings yet
Pandas Dataframe All Operations 1735471870
4 pages
Yrc 1000
No ratings yet
Yrc 1000
493 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
What Is Pandas
No ratings yet
What Is Pandas
9 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
Pandas Syntax Revision For ML
No ratings yet
Pandas Syntax Revision For ML
10 pages
Write A Program For Generalized Bresenham's Line Drawing Algorithm
No ratings yet
Write A Program For Generalized Bresenham's Line Drawing Algorithm
4 pages
Pandas
No ratings yet
Pandas
26 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
Pandas
No ratings yet
Pandas
5 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
Parts of Speech g3
No ratings yet
Parts of Speech g3
30 pages
CS6461-Object Oriented Programming Lab Manual
No ratings yet
CS6461-Object Oriented Programming Lab Manual
37 pages
Answer Updated
No ratings yet
Answer Updated
35 pages
Almagrowthasapoet
No ratings yet
Almagrowthasapoet
2 pages
Collarity, Inc. v. Google, Inc., C.A. No. 11-1103-MPT (D. Del. May 6, 2013)
No ratings yet
Collarity, Inc. v. Google, Inc., C.A. No. 11-1103-MPT (D. Del. May 6, 2013)
20 pages
Querying The Linked Data Graph Using Owl:Sameas Provenance
No ratings yet
Querying The Linked Data Graph Using Owl:Sameas Provenance
13 pages
What Is Grammar - Nelson
No ratings yet
What Is Grammar - Nelson
9 pages
Soulmates 1
No ratings yet
Soulmates 1
2 pages
Python 2
No ratings yet
Python 2
15 pages
Quote #867 (Prem Vidya Industries)
No ratings yet
Quote #867 (Prem Vidya Industries)
2 pages
Ensayos en Línea para Comprar
100% (1)
Ensayos en Línea para Comprar
7 pages
Martyrdom at An
No ratings yet
Martyrdom at An
8 pages
EM 12X2 Complex Numbers 2023
No ratings yet
EM 12X2 Complex Numbers 2023
13 pages
Ivan M. Linforth Soul and Sieve in Plato's Gorgias. University of California Publications in Classical Philology Tate, J
No ratings yet
Ivan M. Linforth Soul and Sieve in Plato's Gorgias. University of California Publications in Classical Philology Tate, J
2 pages
Descriptive Writing
No ratings yet
Descriptive Writing
7 pages
Soal Bhs Inggris Olimpiade 2
No ratings yet
Soal Bhs Inggris Olimpiade 2
3 pages
Speaking Skills - Functional Language
No ratings yet
Speaking Skills - Functional Language
7 pages
LKPD Verb 2
No ratings yet
LKPD Verb 2
3 pages
2.personal Pronouns
No ratings yet
2.personal Pronouns
1 page
Software Design Simplified
From Everand
Software Design Simplified
Liviu Catalin Dorobantu
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet

Mastering Data Analyst Interview Scenarios

Uploaded by

Mastering Data Analyst Interview Scenarios

Uploaded by

Data Analyst

Candidate: You can use apply() to

You can use pd.to_datetime() to

You can use the apply function or

df1.merge(df2, left_on=['col1', 'col2'],

We can use the dropna() method

You can use the pivot_table()

You can use apply() with axis=1 to

df['SumAB'] = df.apply(lambda row: row['A'] +

You can use the isin() method to

df_filtered = df[df['Category'].isin(['A', 'B', 'C'])]

You can use the rolling() function

You can use the drop_duplicates()

You can use reset_index() to

You can use groupby() with head()

You can use merge() with an

You can use the value_counts()

Candidate: You can use the

Candidate: You can use fillna() to

You might also like