0% found this document useful (0 votes)
14 views

Support Functions

This document contains two functions: min_max_scaler which takes a dataframe and list of columns and returns a scaled dataframe, and column_dropper which takes a dataframe and threshold and returns a dataframe with columns dropped if the missing value percentage exceeds the threshold.

Uploaded by

Tu Phung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Support Functions

This document contains two functions: min_max_scaler which takes a dataframe and list of columns and returns a scaled dataframe, and column_dropper which takes a dataframe and threshold and returns a dataframe with columns dropped if the missing value percentage exceeds the threshold.

Uploaded by

Tu Phung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 1

def min_max_scaler(df, cols_to_scale):

# Takes a dataframe and list of columns to minmax scale. Returns a dataframe.


for col in cols_to_scale:
# Define min and max values and collect them
max_values = df.agg({col: 'max'}).collect()[0][0]
min_values = df.agg({col: 'min'}).collect()[0][0]
new_column_name = 'scaled_' + col
# Create a new column based off the scaled data
df = df.withColumn(new_column_name,
(df[col] - min_values) / (max_values - min_values))
return df

def column_dropper(df, threshold):


# Takes a dataframe and threshold for missing values.
# Returns a dataframe.
total_records = df.count()
for col in df.columns:
# Calculate the percentage of missing values
missing = df.where(df[col].isNull()).count()
missing_percent = missing / total_records
# Drop column if percent of missing is more than threshold
if missing_percent > threshold:
df = df.drop(col)
return df

You might also like