0% found this document useful (0 votes)
2 views

PySpark_Basic_Interview_Questions

This document contains a list of basic interview questions related to PySpark, covering fundamental concepts such as RDDs, DataFrames, and various operations like filtering, joining, and handling null values. It also addresses differences between PySpark and Pandas, as well as the use of User Defined Functions (UDFs) and evaluation strategies. Overall, it serves as a guide for preparing for PySpark-related interviews.

Uploaded by

Satyajit Ligade
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

PySpark_Basic_Interview_Questions

This document contains a list of basic interview questions related to PySpark, covering fundamental concepts such as RDDs, DataFrames, and various operations like filtering, joining, and handling null values. It also addresses differences between PySpark and Pandas, as well as the use of User Defined Functions (UDFs) and evaluation strategies. Overall, it serves as a guide for preparing for PySpark-related interviews.

Uploaded by

Satyajit Ligade
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

PySpark Basic Interview Questions

1. What is PySpark?
2. How does PySpark differ from Pandas?
3. What is RDD in PySpark?
4. What is the difference between RDD and DataFrame?
5. How do you create a DataFrame in PySpark?
6. What are the different ways to read data into a DataFrame?
7. What is the difference between select() and selectExpr()?
8. How do you filter data in PySpark?
9. What is the difference between filter() and where()?
10. How do you add a new column to a DataFrame?
11. How do you drop a column from a DataFrame?
12. How do you rename a column in PySpark?
13. What are different join types in PySpark?
14. How do you perform an inner join in PySpark?
15. What is the difference between join() and crossJoin()?
16. What is the use of groupBy() in PySpark?
17. How do you apply aggregate functions in PySpark?
18. How do you handle missing/null values in PySpark?
19. How do you replace null values in PySpark?
20. What is the difference between dropna(), fillna(), and replace()?
21. How do you remove duplicate rows in PySpark?
22. What is the difference between distinct() and dropDuplicates()?
23. How do you sort data in PySpark?
24. What is the difference between orderBy() and sort()?
25. What is a UDF (User Defined Function) in PySpark?
26. How do you register and use a UDF in PySpark?
27. What is the difference between map() and flatMap() in PySpark?
28. What is lazy evaluation in PySpark?
29. What are actions and transformations in PySpark?
30. What is the difference between collect() and show()?

You might also like