0% found this document useful (0 votes)
6 views8 pages

6-Deep Networks Basics - Shallow Neural Networks-29-07-2024

Uploaded by

gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views8 pages

6-Deep Networks Basics - Shallow Neural Networks-29-07-2024

Uploaded by

gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Data Preprocessing in Machine

Learning
Key Steps and Techniques
Data Cleaning
1.Data Cleaning:
1.Handling Missing Values: Techniques
include removing records with missing values,
imputing missing values, or using algorithms that
support missing values.
2.Handling Outliers: Identifying and
removing or transforming outliers to prevent them
from skewing the results.
Data Transformation
2. Data Transformation:
1.Normalization/Standardization:
Scaling features to a similar range to ensure that
no single feature dominates the model's
performance.
2.Encoding Categorical Data: Converting
categorical data into numerical format using
techniques like one-hot encoding, label encoding,
or binary encoding.
Feature Engineering
3. Feature Engineering:
1.Feature Creation: Creating new features
from existing data to enhance the predictive
power of the model.
2.Feature Selection: Selecting the most
relevant features to reduce dimensionality and
improve model performance.
4. Data Splitting:
1.Training and Testing Sets: Splitting the
dataset into training and testing sets to evaluate
the model's performance on unseen data.
2.Cross-Validation: Using techniques like k-
fold cross-validation to further assess the model's
generalizability.
Data Integration
• Combining data from different sources to
provide a more comprehensive dataset.
Common Techniques and Tools
Common Techniques and Tools
1.Imputation:
1. Using mean, median, or mode to fill missing values.
2. Advanced imputation techniques like K-Nearest
Neighbors (KNN) imputation.
2.Scaling:
1. Min-Max Scaling: Scaling features to a range between 0
and 1.
2. Standardization: Scaling features to have a mean of 0
and a standard deviation of 1.
3. Encoding Categorical Data:
1. One-Hot Encoding: Creating binary columns for each
category.
2. Label Encoding: Assigning a unique integer to each
category.
4. Handling Outliers:
3. Z-score: Removing data points that are more than a
certain number of standard deviations away from the
mean.
4. IQR Method: Removing data points that fall outside the
interquartile range.

You might also like