PySpark_Basic_Interview_Questions
PySpark_Basic_Interview_Questions
1. What is PySpark?
2. How does PySpark differ from Pandas?
3. What is RDD in PySpark?
4. What is the difference between RDD and DataFrame?
5. How do you create a DataFrame in PySpark?
6. What are the different ways to read data into a DataFrame?
7. What is the difference between select() and selectExpr()?
8. How do you filter data in PySpark?
9. What is the difference between filter() and where()?
10. How do you add a new column to a DataFrame?
11. How do you drop a column from a DataFrame?
12. How do you rename a column in PySpark?
13. What are different join types in PySpark?
14. How do you perform an inner join in PySpark?
15. What is the difference between join() and crossJoin()?
16. What is the use of groupBy() in PySpark?
17. How do you apply aggregate functions in PySpark?
18. How do you handle missing/null values in PySpark?
19. How do you replace null values in PySpark?
20. What is the difference between dropna(), fillna(), and replace()?
21. How do you remove duplicate rows in PySpark?
22. What is the difference between distinct() and dropDuplicates()?
23. How do you sort data in PySpark?
24. What is the difference between orderBy() and sort()?
25. What is a UDF (User Defined Function) in PySpark?
26. How do you register and use a UDF in PySpark?
27. What is the difference between map() and flatMap() in PySpark?
28. What is lazy evaluation in PySpark?
29. What are actions and transformations in PySpark?
30. What is the difference between collect() and show()?