Code Review
Code Review
To understand step 17 here’s the demo for you. And you can also visit this page for more
infomation https://www.datasciencemadesimple.com/populate-row-number-in-pyspark-row-
number-by-group/
18. In line 46 we are filtering rank values and selecting only those who are less than 6
values in every npa group.
19. In line 47 we are filtering the distance column to check we have only those row that is
less then MAX_DISTANCE
20. In 48 line we are finally saving that in the output file. That path is also given by us at the
run time in the second argument.