Unit 6aics
Unit 6aics
Feature Engineering
Prepared by
Prof. Kusuma
What is Feature Engineering?
Feature engineering is the art of converting raw data into useful input
variables (features) that improve the performance of
machine learning models. It helps in choosing the most useful
features to enhance a model’s capacity to learn patterns & make
good predictions.
Feature engineering encompasses methods like feature scaling,
encoding categorical variables, feature selection, and building
interaction terms.
Why is Feature Engineering Important?
Feature engineering is one of the most critical steps in
machine learning. Even the most advanced algorithms can fail if they
are trained on poorly designed features. Here’s why it matters:
Improves Model Accuracy
A well-engineered feature set allows a model to capture patterns more effectively,
leading to higher accuracy. For example, converting a date column into “day of the
week” or “holiday vs. non-holiday” can improve sales forecasting models.
2. Reduces Overfitting and Underfitting
By removing irrelevant or highly correlated features, feature engineering prevents the
model from memorizing noise (overfitting) and ensures it generalizes well on unseen
data.
3. Enhances Model Interpretability
Features that align with domain knowledge make the model’s decisions more
explainable. For instance, in fraud detection, a feature like “number of transactions per
hour” is more informative than raw timestamps.
4. Boosts Training Efficiency
Reducing the number of unnecessary features decreases computational complexity,
making training faster and more efficient.
5. Handles Noisy and Missing Data
Raw data is often incomplete or contains outliers. Feature engineering helps clean and
structure this data, ensuring better learning outcomes.
Feature Selection
Selecting the most relevant features while eliminating redundant,
irrelevant, or highly correlated variables helps improve model
efficiency and accuracy.
Techniques:
Filter Methods: Uses statistical techniques like correlation, variance
threshold, or mutual information to select important features.
Wrapper Methods: Uses iterative techniques like Recursive Feature
Elimination (RFE) and stepwise selection.
Embedded Methods: Feature selection is built into the algorithm,
such as Lasso Regression (L1 regularization) or decision tree-based
models.
Example: Removing highly correlated features like “Total Sales” and
“Average Monthly Sales” if one can be derived from the other.
Feature Transformation
Transforms raw data to improve model learning by making it more
interpretable or reducing skewness.
Techniques:
Normalization (Min-Max Scaling): Rescales values between 0 and
1. Useful for distance-based models like k-NN.
Standardization (Z-score Scaling): Transforms data to have a
mean of 0 and standard deviation of 1. Works well for gradient-based
models like logistic regression.
Log Transformation: Converts skewed data into a normal
distribution.
Power Transformation (Box-Cox, Yeo-Johnson): Used to stabilize
variance and make data more normal-like.
Example: Scaling customer income before using it in a model to
prevent high-value dominance.
Feature Encoding
Techniques:
One-Hot Encoding (OHE): Creates binary columns for each
category (suitable for low-cardinality categorical variables).
Label Encoding: Assigns numerical values to categories (useful for
ordinal categories like “low,” “medium,” “high”).
Target Encoding: Replaces categories with the mean target value
(commonly used in regression models).
Frequency Encoding: Converts categories into their occurrence
frequency in the dataset.
San
City New York Chicago
Francisco
NY 1 0 0
SF 0 1 0
Feature Creation (Derived Features)
Feature creation involves constructing new features from existing
ones to provide additional insights and improve model performance.
Well-crafted features can capture hidden relationships in data,
making patterns more evident to machine learning models.
Techniques:
Polynomial Features: Useful for models that need to capture non-
linear relationships between variables.
Example: If a model struggles with a linear relationship, adding
polynomial terms like x², x³, or interaction terms (x1 * x2) can improve
performance.
Use Case: Predicting house prices based on features like square footage
and number of rooms. Instead of just using square footage, a model could
benefit from an interaction term like square_footage * number_of_rooms.
Binning (Discretization): Converts continuous variables into categorical
bins to simplify the relationship.
Example: Instead of using raw age values (22, 34, 45), we can group them into
bins:
Young (18-30)
Middle-aged (31-50)
Senior (51+)
Use Case: Credit risk modeling, where different age groups have different risk
levels.
Ratio Features: Creating ratios between two related numerical values to
normalize the impact of scale.
Example: Instead of using income and loan amount separately, use Income-to-
Loan Ratio = Income / Loan Amount to standardize comparisons across different
income levels.
Use Case: Loan default prediction, where individuals with a higher debt-to-
income ratio are more likely to default.
Time-based Features: Extracts meaningful insights from
timestamps, such as:
Hour of the day (helps in traffic analysis)
Day of the week (useful for sales forecasting)
Season (important for retail and tourism industries)
Use Case: Predicting e-commerce sales by analyzing trends based on
weekdays vs. weekends.
Example
Salary
Customer
Age Salary Missing
ID
Indicator
101 35 50,000 0
102 42 NaN 1
103 29 40,000 0
Feature Extraction
Feature extraction involves deriving new, meaningful representations from complex
data formats like text, images, and time-series. This is especially useful in high-
dimensional datasets.
Techniques:
Text Features: Converts textual data into numerical form for machine learning
models.
Bag of Words (BoW): Represents text as word frequencies in a matrix.
TF-IDF (Term Frequency-Inverse Document Frequency): Gives importance to words
based on their frequency in a document vs. overall dataset.
Word Embeddings (Word2Vec, GloVe, BERT): Captures semantic meaning of words.
Use Case: Sentiment analysis of customer reviews.
Image Features: Extract essential patterns from images.
Edge Detection: Identifies object boundaries in images (useful in medical imaging).
Histogram of Oriented Gradients (HOG): Used in object detection.
CNN-based Feature Extraction: Uses deep learning models like ResNet and VGG for
automatic feature learning.
Use Case: Facial recognition, self-driving car object detection.
Time-Series Features: Extract meaningful trends and seasonality
from time-series data.
Rolling Averages: Smooth out short-term fluctuations.
Seasonal Decomposition: Separates trend, seasonality, and residual
components.
Autoregressive Features: Uses past values as inputs for predictive
models.
Use Case: Forecasting electricity demand based on historical consumption
patterns.
Dimensionality Reduction (PCA, t-SNE, UMAP):
PCA (Principal Component Analysis) reduces high-dimensional data
while preserving variance.
t-SNE and UMAP are useful for visualizing clusters in large datasets.
Use Case: Reducing thousands of customer behavior variables into a few
principal components for clustering.
Example:
For text analysis, TF-IDF converts raw sentences into numerical form:
“AI is “AI is
Sentence transforming advancing
healthcare” research”
AI 0.4 0.3
transforming 0.6 0.0
research 0.0 0.7
Handling outliers
Techniques:
Winsorization: Replaces extreme values with a specified percentile
(e.g., capping values at the 5th and 95th percentile).
Z-score Method: Removes values that are more than a certain
number of standard deviations from the mean (e.g., ±3σ).
IQR (Interquartile Range) Method: Removes values beyond 1.5
times the interquartile range (Q1 and Q3).
Transformations (Log, Square Root): Reduces the impact of
extreme values by adjusting scale.
:
Outlier (IQR
Employee Salary
Method)
A 50,000 No
B 52,000 No
C 200,000 Yes
Feature Interaction