0% found this document useful (0 votes)
12 views1 page

Data Preprocessing and Cleaning

Data preprocessing is crucial for preparing clean data for machine learning models. Key steps include handling missing values, encoding categorical data, scaling numerical features, and splitting data into training and test sets. The document provides examples of using Python libraries for these preprocessing tasks.

Uploaded by

someoneishere721
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views1 page

Data Preprocessing and Cleaning

Data preprocessing is crucial for preparing clean data for machine learning models. Key steps include handling missing values, encoding categorical data, scaling numerical features, and splitting data into training and test sets. The document provides examples of using Python libraries for these preprocessing tasks.

Uploaded by

someoneishere721
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Data Preprocessing and Cleaning

Data preprocessing is essential to ensure clean, usable data for ML models.

Key Steps:
- Handling missing values (`df.fillna()`, `df.dropna()`)
- Encoding categorical data (`pd.get_dummies()`, `LabelEncoder`)
- Scaling numerical features (`StandardScaler`, `MinMaxScaler`)
- Splitting data into training and test sets (`train_test_split`)

Example:
--------------------------------
from sklearn.preprocessing import StandardScaler, LabelEncoder

df.fillna(df.mean(), inplace=True)

le = LabelEncoder()
df['category_encoded'] = le.fit_transform(df['category'])

scaler = StandardScaler()
df[['feature1', 'feature2']] = scaler.fit_transform(df[['feature1', 'feature2']])
--------------------------------

You might also like