6-Deep Networks Basics - Shallow Neural Networks-29-07-2024
6-Deep Networks Basics - Shallow Neural Networks-29-07-2024
Learning
Key Steps and Techniques
Data Cleaning
1.Data Cleaning:
1.Handling Missing Values: Techniques
include removing records with missing values,
imputing missing values, or using algorithms that
support missing values.
2.Handling Outliers: Identifying and
removing or transforming outliers to prevent them
from skewing the results.
Data Transformation
2. Data Transformation:
1.Normalization/Standardization:
Scaling features to a similar range to ensure that
no single feature dominates the model's
performance.
2.Encoding Categorical Data: Converting
categorical data into numerical format using
techniques like one-hot encoding, label encoding,
or binary encoding.
Feature Engineering
3. Feature Engineering:
1.Feature Creation: Creating new features
from existing data to enhance the predictive
power of the model.
2.Feature Selection: Selecting the most
relevant features to reduce dimensionality and
improve model performance.
4. Data Splitting:
1.Training and Testing Sets: Splitting the
dataset into training and testing sets to evaluate
the model's performance on unseen data.
2.Cross-Validation: Using techniques like k-
fold cross-validation to further assess the model's
generalizability.
Data Integration
• Combining data from different sources to
provide a more comprehensive dataset.
Common Techniques and Tools
Common Techniques and Tools
1.Imputation:
1. Using mean, median, or mode to fill missing values.
2. Advanced imputation techniques like K-Nearest
Neighbors (KNN) imputation.
2.Scaling:
1. Min-Max Scaling: Scaling features to a range between 0
and 1.
2. Standardization: Scaling features to have a mean of 0
and a standard deviation of 1.
3. Encoding Categorical Data:
1. One-Hot Encoding: Creating binary columns for each
category.
2. Label Encoding: Assigning a unique integer to each
category.
4. Handling Outliers:
3. Z-score: Removing data points that are more than a
certain number of standard deviations away from the
mean.
4. IQR Method: Removing data points that fall outside the
interquartile range.