Support Functions

This document contains two functions: min_max_scaler which takes a dataframe and list of columns and returns a scaled dataframe, and column_dropper which takes a dataframe and threshold and returns a dataframe with columns dropped if the missing value percentage exceeds the threshold.

Uploaded by

Tu Phung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Support Functions

Uploaded by

Tu Phung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 1

def min_max_scaler(df, cols_to_scale):

# Takes a dataframe and list of columns to minmax scale. Returns a dataframe.

for col in cols_to_scale:
# Define min and max values and collect them
max_values = df.agg({col: 'max'}).collect()[0][0]
min_values = df.agg({col: 'min'}).collect()[0][0]
new_column_name = 'scaled_' + col
# Create a new column based off the scaled data
df = df.withColumn(new_column_name,
(df[col] - min_values) / (max_values - min_values))
return df

def column_dropper(df, threshold):

# Takes a dataframe and threshold for missing values.
# Returns a dataframe.
total_records = df.count()
for col in df.columns:
# Calculate the percentage of missing values
missing = df.where(df[col].isNull()).count()
missing_percent = missing / total_records
# Drop column if percent of missing is more than threshold
if missing_percent > threshold:
df = df.drop(col)
return df

Data Frame Creation
No ratings yet
Data Frame Creation
10 pages
Avinash DA 6
No ratings yet
Avinash DA 6
3 pages
Lab5.ipynb - Colaboratory
No ratings yet
Lab5.ipynb - Colaboratory
8 pages
Data Handling Part Ii
No ratings yet
Data Handling Part Ii
41 pages
EXP-2
No ratings yet
EXP-2
6 pages
Apr 2023
No ratings yet
Apr 2023
32 pages
IP Practic MINE
No ratings yet
IP Practic MINE
30 pages
2-Introduction to data cleaning P02
No ratings yet
2-Introduction to data cleaning P02
7 pages
Lab File
No ratings yet
Lab File
96 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
Acknowledgement
No ratings yet
Acknowledgement
25 pages
EXP-2 ML
No ratings yet
EXP-2 ML
6 pages
Informatics Practices Record class 12
No ratings yet
Informatics Practices Record class 12
60 pages
Final Class 12 Commerce Practical File
No ratings yet
Final Class 12 Commerce Practical File
19 pages
Pandas
No ratings yet
Pandas
4 pages
DS Problem Statements and Codes
No ratings yet
DS Problem Statements and Codes
21 pages
Ip Project
No ratings yet
Ip Project
27 pages
Unit 5 Python
No ratings yet
Unit 5 Python
30 pages
Data Cleaning in Python
No ratings yet
Data Cleaning in Python
6 pages
Dsbda Ass2
No ratings yet
Dsbda Ass2
49 pages
IP_PRACTICAL EXAM _Revision
No ratings yet
IP_PRACTICAL EXAM _Revision
24 pages
Week 2
No ratings yet
Week 2
2 pages
aide memoire preparation des données
No ratings yet
aide memoire preparation des données
2 pages
I037 - Manas Patel Experiment09
No ratings yet
I037 - Manas Patel Experiment09
9 pages
Ex No3
No ratings yet
Ex No3
17 pages
Wa0012.
No ratings yet
Wa0012.
30 pages
Group 10A - GA2
No ratings yet
Group 10A - GA2
10 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Panda 2
No ratings yet
Panda 2
2 pages
Ip Project
No ratings yet
Ip Project
21 pages
Term 1 IP AK
No ratings yet
Term 1 IP AK
6 pages
XII IP PRACTICAL LIST 2022-23-1
No ratings yet
XII IP PRACTICAL LIST 2022-23-1
23 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Intermediate Machine learning
No ratings yet
Intermediate Machine learning
12 pages
Day 10 Pandasdatacleaning
No ratings yet
Day 10 Pandasdatacleaning
6 pages
Https Raw - Githubusercontent.com Joelgrus Data-Science-From-Scratch Master Code Working With Data
No ratings yet
Https Raw - Githubusercontent.com Joelgrus Data-Science-From-Scratch Master Code Working With Data
7 pages
Etl1 6
No ratings yet
Etl1 6
6 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Informatics Practicals 12th (Personal)
No ratings yet
Informatics Practicals 12th (Personal)
89 pages
Informatics Practices Practical List22-2323
No ratings yet
Informatics Practices Practical List22-2323
6 pages
Class 12 Practical File Informatics Practices
No ratings yet
Class 12 Practical File Informatics Practices
28 pages
Document (4)-1
No ratings yet
Document (4)-1
15 pages
Code explanation for date types
No ratings yet
Code explanation for date types
8 pages
Data Analytics lab manual
No ratings yet
Data Analytics lab manual
47 pages
Document (4)
No ratings yet
Document (4)
15 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
Python ClassXII AI
No ratings yet
Python ClassXII AI
4 pages
download
No ratings yet
download
3 pages
Data science tutorial
No ratings yet
Data science tutorial
40 pages
EXP-12_IAIML
No ratings yet
EXP-12_IAIML
13 pages
EDA (2)
No ratings yet
EDA (2)
7 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
00a217fb-9d73-44e4-90d5-954e50b0c3db
No ratings yet
00a217fb-9d73-44e4-90d5-954e50b0c3db
2 pages
First 4
No ratings yet
First 4
11 pages
Exp 01-B Feature Selection and Extraction
No ratings yet
Exp 01-B Feature Selection and Extraction
12 pages
Practical
No ratings yet
Practical
29 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet

Support Functions

Uploaded by

Support Functions

Uploaded by

def min_max_scaler(df, cols_to_scale):

# Takes a dataframe and list of columns to minmax scale. Returns a dataframe.

def column_dropper(df, threshold):

You might also like