0% found this document useful (0 votes)
13 views2 pages

Data Handling Ques

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views2 pages

Data Handling Ques

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Exploration and Cleaning:

1. You have a DataFrame with customer purchase data. How would you identify and handle missing values in columns like
purchase_amount and customer_age?

2. A DataFrame contains product reviews with text data. How would you remove rows where the review is empty or consists only of
whitespace?

3. You need to identify duplicate rows in a DataFrame containing user data. How would you find and remove these duplicates?

4. You are given a DataFrame with a column containing date strings in different formats. How would you standardize this column to
datetime objects?

5. The dataset contains a column of product prices with some values as strings like '$100'. How would you convert this column to
numerical values?

6. You have a dataset with a column containing ages, but some values are recorded as negative numbers. How would you correct
this?

7. A DataFrame has a salary column with some entries mistakenly recorded as 'N/A'. How would you replace these with the average
salary?

8. Your DataFrame contains multiple columns with leading and trailing spaces in the column names. How would you clean up these
column names?

9. You receive a DataFrame with inconsistent state codes ('CA', 'California', 'calif'). How would you standardize the state column?

10. How would you remove rows from a DataFrame where a specific column contains outlier values?

Data Aggregation and Grouping:

11. You have sales data for multiple regions. How would you find the total sales for each region?

12. Given a DataFrame with columns city, month, and temperature, how would you calculate the average monthly temperature for
each city?

13. You need to find the most frequently purchased product for each customer in a DataFrame containing customer_id and product_id
columns. How would you do this?

14. How would you group a DataFrame by a category column and calculate both the mean and standard deviation for a sales column
within each group?

15. How do you create a summary table that shows the sum of sales for each combination of region and year?

16. You have a DataFrame with a timestamp column. How would you group data by week and calculate the total number of
occurrences for each week?

17. Given a DataFrame of customer orders, how would you identify customers who have placed more than 10 orders?

18. How would you find the top 3 products with the highest sales for each region?

19. You have a DataFrame with columns store, item, and revenue. How would you find the item with the highest average revenue per
store?

20. How would you calculate a cumulative sum of sales for each product category in a sales DataFrame?

Data Manipulation and Transformation:

21. You need to add a column to a DataFrame that indicates whether each customer_age value is above or below the median age.
How would you do this?

22. How would you normalize the price column in a DataFrame so that it has a mean of 0 and a standard deviation of 1?

23. You are given a DataFrame with a date column. How would you add separate columns for year, month, and day?

24. A DataFrame has columns start_time and end_time in datetime format. How would you create a new column showing the time
difference in hours?

25. You need to replace all occurrences of a specific product ID in a DataFrame with a new ID. How would you do this?

26. How would you convert a column containing string representations of lists into actual Python lists in a DataFrame?
27. You have two DataFrames with overlapping data. How would you merge them such that only rows that match in both DataFrames
are included in the final result?

28. How would you pivot a DataFrame to transform rows into columns based on a year column?

29. You need to filter rows based on a condition that involves multiple columns (e.g., age > 30 and income > 50000). How would you
do this?

30. How do you create a column that ranks each row based on the score column?

Time Series Analysis:

31. A DataFrame contains stock prices with a date column. How would you calculate the rolling 7-day average of the stock prices?

32. You have a time series DataFrame with irregular time intervals. How would you resample the data to a daily frequency and fill
missing dates with the last observed value?

33. How would you create a column that indicates whether each date in a date column is a weekday or a weekend?

34. Given a DataFrame with a date column, how would you find the day of the week with the highest average sales?

35. How would you calculate the percentage change in a price column over a specific period, such as monthly changes?

36. You need to extract the quarter from a datetime column. How would you do this?

37. Given a DataFrame with daily temperature data, how would you find the day with the largest temperature increase?

38. How would you downsample a DataFrame with minute-level data to hourly data using the sum of values?

39. How would you create a time series plot of sales data using pandas?

40. You need to find the rolling maximum value in a time series over a 30-day window. How would you do this?

Advanced Data Analysis:

41. You have a DataFrame with customer transactions, and you want to calculate the average time between purchases for each
customer. How would you do this?

42. How would you identify rows where a category changes between consecutive rows in a sorted DataFrame?

43. A DataFrame contains survey responses with columns like rating_1 to rating_5. How would you calculate the average rating for
each row?

44. You have a DataFrame with user interactions and want to calculate the number of unique interactions per user. How would you do
this?

45. How would you identify the top 5 most common words in a column of text data?

46. You have a DataFrame with a column of strings. How would you create a new column that indicates the length of each string?

47. How would you calculate the correlation between numerical columns in a DataFrame?

48. Given a DataFrame with sales data, how would you identify the month with the highest sales for each product?

49. You have a DataFrame with a score column. How would you categorize the scores into bins (e.g., low, medium, high)?

50. How would you calculate the Gini coefficient for a column of income values in a DataFrame?

You might also like