Pyspark 12 Questions
Pyspark 12 Questions
PySpark
Tricks
df = df.dropDuplicates(["name", "age"])
df.groupBy("department").agg({"salary": "avg",
"bonus": "sum"}).show()
df = df.withColumn("new_column",
df["existing_column"] * 10)
df_exploded = df.withColumn("exploded_column",
explode(df["array_column"]))
df = df.coalesce(5)
df = df.repartition(10, "department")
df.rdd.mapPartitions(lambda partition:
some_function(partition))
df.write.mode("overwrite").partitionBy("year",
"month").parquet("output_path")
https://www.seekhobigdata.com/
+91 99894 54737