Python - Pandas_Numpy Interview Q&A
Python - Pandas_Numpy Interview Q&A
Note: For Data Analyst role interviews, you will typically face two different kinds
of questions:
---
Sample DataFrame:
---
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.to_period('M')
monthly_sales = df.groupby('Month')['Sales_Amount'].sum()
mom_change = monthly_sales.pct_change() * 100
---
df['Cumulative_Sales'] =
df.groupby('Salesperson_ID')['Sales_Amount'].cumsum()
---
---
top_3_salespersons =
df.groupby('Salesperson_ID')['Sales_Amount'].sum().nlargest(3)
---
df['Rolling_Avg_Sales'] =
df.groupby('Salesperson_ID')['Sales_Amount'].rolling(window=3).mean().reset_in
dex(0, drop=True)
---
Sample DataFrame:
---
6. Find the Median Salary for Each Department
Task: Calculate the median salary for each department (Dept_ID).
Solution:
Group by Dept_ID and use the median() function on Salary.
median_salary = df.groupby('Dept_ID')['Salary'].median()
---
df['Join_Date'] = pd.to_datetime(df['Join_Date'])
employees_before_2020 = df[df['Join_Date'] < '2020-01-01'].count()
---
top_employee = df.loc[df['Projects_Completed'].idxmax()]
---
df['Salary_Rank'] = df['Salary'].rank(ascending=False)
---
Sample DataFrame:
---
---
12. Calculate Year-over-Year Sales Growth
Task: Calculate the percentage growth from Q1 to Q4 for each product.
Solution:
Compute the growth percentage using the formula (Sales_Q4 - Sales_Q1) /
Sales_Q1 * 100.
---
---
---
df['Sales_Rank'] = df['Total_Sales'].rank(ascending=False)
---
Sample DataFrame:
---
total_spend = df.groupby('Customer_ID')['Amount_Spent'].sum()
---
customers_multiple_purchases = df.groupby('Customer_ID').filter(lambda x:
len(x) > 1)
---
df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(by=['Customer_ID', 'Date'])
df['Days_Between'] = df.groupby('Customer_ID')['Date'].diff().dt.days
---
top_product = df.groupby('Product_ID')['Amount_Spent'].sum().idxmax()
---
df['Running_Total'] = df.groupby('Customer_ID')['Amount_Spent'].cumsum()
14. What are the key differences between Pandas and NumPy?
- Pandas provides higher-level data structures (Series, DataFrame) with labeled
axes and is more suited for handling structured data, while NumPy is focused on
efficient numerical computations with arrays.