0% found this document useful (0 votes)
3 views5 pages

Python Data Insights Using Pandas Interview Q&A

The document provides a comprehensive guide on using Pandas for data analysis, including generating sample data, identifying trends, correlations, and creating visualizations. It also covers how to communicate insights to non-technical stakeholders, support business decision-making, and measure the effectiveness of strategies. Key examples include calculating average sales by product, processing times by department, and ROI for business initiatives.

Uploaded by

yadavsumitsy1003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views5 pages

Python Data Insights Using Pandas Interview Q&A

The document provides a comprehensive guide on using Pandas for data analysis, including generating sample data, identifying trends, correlations, and creating visualizations. It also covers how to communicate insights to non-technical stakeholders, support business decision-making, and measure the effectiveness of strategies. Key examples include calculating average sales by product, processing times by department, and ROI for business initiatives.

Uploaded by

yadavsumitsy1003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Pandas Interview Questions & Answer

Data Insight

Sample Data for Analysis


In [1]: import pandas as pd
import numpy as np

# Set seed for reproducibility


np.random.seed(42)

# Create a range of dates


dates = pd.date_range(start="2023-01-01", end="2025-01-01", freq='D')
n = len(dates)

# Generate sample data


df = pd.DataFrame({
'date': dates,
'sales': np.random.randint(100, 1000, size=n),
'product': np.random.choice(['Product A', 'Product B', 'Product C'], size=n
'department': np.random.choice(['HR', 'Sales', 'IT', 'Operations'], size=n),
'processing_time': np.random.normal(loc=5, scale=2, size=n).clip(1, 15),
'customer_id': np.random.randint(1, 500, size=n),
'initiative': np.random.choice(['None', 'New Campaign'], size=n, p=[0.8, 0.2
'revenue': np.random.uniform(200, 1000, size=n),
'cost': np.random.uniform(100, 500, size=n)
})

# Save the dataset as CSV


df.to_csv("sample_business_data.csv", index=False)
print("Sample dataset saved as 'sample_business_data.csv'")

Sample dataset saved as 'sample_business_data.csv'

1. How do you identify trends in a dataset using


Pandas?
In [5]: import pandas as pd

# Filepath to your CSV


filepath = r'D:\sales_data.csv'

# Step 1: Read the file and parse dates


df = pd.read_csv(filepath)

# Step 2: Convert 'date' to datetime (just to be sure)


df['date'] = pd.to_datetime(df['date'], errors='coerce')
# Step 3: Set the datetime column as index
df.set_index('date', inplace=True)

# Step 4: Confirm the index type


print(type(df.index)) # Should show DatetimeIndex

# Step 5: Now you can safely resample


monthly_trend = df['sales'].resample('M').mean()

print(monthly_trend.tail())

<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
date
2024-09-30 593.333333
2024-10-31 564.833333
2024-11-30 400.750000
2024-12-31 354.916667
2025-01-31 495.000000
Name: sales, dtype: float64

2. How do you identify correlations between


columns in a DataFrame?
In [7]: # Select only numeric columns
numeric_df = df.select_dtypes(include='number')

# Now compute the correlation matrix


correlation_matrix = numeric_df.corr()

print(correlation_matrix)

sales processing_time customer_id revenue cost


sales 1.000000 -0.014953 -0.047259 0.006676 -0.005750
processing_time -0.014953 1.000000 -0.013571 0.015301 -0.046653
customer_id -0.047259 -0.013571 1.000000 -0.001218 0.017012
revenue 0.006676 0.015301 -0.001218 1.000000 0.020078
cost -0.005750 -0.046653 0.017012 0.020078 1.000000

3. How do you create a data story using Pandas


and data visualization?
In [9]: import matplotlib.pyplot as plt

monthly_sales = df['sales'].resample('M').sum()

plt.figure(figsize=(10, 5))
plt.plot(monthly_sales, marker='o')
plt.title("Monthly Sales Trend")
plt.xlabel("Month")
plt.ylabel("Sales")
plt.grid(True)
plt.show()

4. How do you communicate complex data


insights to non-technical stakeholders?
To communicate complex data insights to non-technical stakeholders, I:

1. Focus on the “So What?”

I highlight what the data means for the business — not just present the numbers.

2. Use Clear Visuals

I use simple charts and graphs (like bar charts or trend lines) to make the insights
intuitive and easy to digest.

3. Avoid Technical Jargon

I explain findings in plain language, such as saying “sales increased by 15% after
the campaign” instead of using statistical terms.

4. Tell a Story

I structure the insight like a story — beginning with the business problem, followed
by what the data shows, and ending with a recommended action.
5. How do you use Pandas to support business
decision-making?
In [10]: # Example: Which product has the highest average sales?
avg_sales_by_product = df.groupby('product')['sales'].mean().sort_values(ascending

print(avg_sales_by_product)

product
Product B 577.234310
Product C 560.983333
Product A 556.426877
Name: sales, dtype: float64

6. How do you use Pandas to identify areas for


process improvement?
In [11]: # Example: Find departments with longest average processing times
avg_processing_time = df.groupby('department')['processing_time'].mean().sort_values

print(avg_processing_time)

department
HR 5.167334
Operations 5.048581
Sales 4.934299
IT 4.892930
Name: processing_time, dtype: float64

7. How do you use Pandas to measure the


effectiveness of a business strategy?
In [13]: pre_campaign = df[df.index < '2024-01-01']['sales'].mean()
post_campaign = df[df.index >= '2024-01-01']['sales'].mean()

effectiveness = post_campaign - pre_campaign


print(f"Change in average sales: {effectiveness}")

Change in average sales: -25.73515325670496

In [14]: df.reset_index(inplace=True)

pre_campaign = df[df['date'] < '2024-01-01']['sales'].mean()


post_campaign = df[df['date'] >= '2024-01-01']['sales'].mean()

effectiveness = post_campaign - pre_campaign


print(f"Change in average sales: {effectiveness}")

Change in average sales: -25.73515325670496


8. How do you use Pandas to identify trends and
patterns in customer behavior?
In [15]: # Example: Frequency of purchases per customer
purchase_freq = df.groupby('customer_id').size().sort_values(ascending=False)

print(purchase_freq.head())

customer_id
147 7
424 6
369 6
431 5
41 5
dtype: int64

9. How do you use Pandas to create a data-


driven business case?
In [16]: # Example: Revenue generated per product
revenue = df.groupby('product')['sales'].sum().sort_values(ascending=False)

print(revenue)

product
Product A 140776
Product B 137959
Product C 134636
Name: sales, dtype: int64

10. How do you use Pandas to measure the


return on investment (ROI) of a business
initiative?
In [17]: # Example ROI calculation
total_gain = df[df['initiative'] == 'New Campaign']['revenue'].sum()
total_cost = df[df['initiative'] == 'New Campaign']['cost'].sum()

roi = (total_gain - total_cost) / total_cost * 100


print(f"ROI: {roi:.2f}%")

ROI: 92.30%

In [ ]:

You might also like