Pandas - PySpark Equivalents-1
Pandas - PySpark Equivalents-1
Selecting
Columns df[['column1', 'column2']] df.select('column1', 'column2')
Filtering Data
df[df['column'] > value] df.filter(df['column'] > value)
Handling
df.dropna() df.na.drop()
Missing Values
Renaming
Columns df.rename(columns={'old_name': 'new_name'} df = df.withColumnRenamed('old_name', 'new_name')
Creating New
df[new_column] = values df.withColumn("new_column", values)
Column
Display DF
df.info() df.printSchema()
Schema Info
OPERATION PANDAS PYSPARK
Column
df.drop(columns=['column_name']) df.drop('column_name')
Deletion
Dropping
df.drop_duplicates() df.dropDuplicates()
Duplicates
Dataframe
pd.concat([df1, df2]) df.union(df2)
Concatenation
Find Unique
df['column_name'].unique() df.select('column_name').distinct()
Values