AI-Module 4 Updated
AI-Module 4 Updated
2. Transformation:
• It involves adjusting the predictor variable to improve the accuracy and
performance of the model.
• It ensures that all the variables are on the same scale, making the model
easier to understand.
• It ensures that all the features are within the acceptable range to avoid
any computational error.
18/03/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 4
3. Feature Extraction:
• Feature extraction is an automated feature engineering process that
generates new variables by extracting them from the raw data.
• The aim is to reduce the volume of data so that it can be easily used and
managed for data modelling.
• Feature extraction methods include cluster analysis, text analytics, edge
detection algorithms, and principal components analysis (PCA).
4. Feature Selection:
• Feature selection is a way of selecting the subset of the most relevant
features from the original features set by removing the redundant,
irrelevant, or noisy features.
• This is done in order to reduce overfitting in the model and improve the
performance.
18/03/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 5
Feature Engineering Techniques
1. Imputation:
• Imputation deals with handling missing values in data.
• Deleting records that are missing is one way of dealing with missing data
issue. But it could lead to losing out on a chunk of valuable data. This is
where imputation can help.
• Data imputation can be classified into two types:
Categorical Imputation: Missing categorical values are generally
replaced by the most commonly occurring value (mode) of the feature.
Numerical Imputation: Missing numerical values are generally replaced
by the mean or median of the corresponding feature.
18/03/2024 14
• Example 2: Time stamp is split into 6 different attributes.
18/03/2024 26
1. Supervised Machine Learning Algorithms:
• The primary purpose of supervised learning is to scale the scope of
data and to make predictions of unavailable, future or unseen data
based on labeled sample data.
• Supervised learning is where there are input variables (x) and an
output variable (Y) and an algorithm is used to learn the mapping
function from the input to the output Y = f(x) .
• The goal is to approximate the mapping function so well that when
there comes a new input data (x), the machine should be able to
predict the output variable (Y) for that data.
• Supervised machine learning includes two major
processes: classification and regression.
18/03/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 27
Classification is the process which basically categorizes a set of data into classes
(yes/no, true/false, 0/1, yes/no/may be). There are various types of Classification
problems, such as: Binary Classification, Multi-class Classification, Multi-label
Classification. Examples for classification problems are: Spam filtering, Image
classification, Sentiment analysis, Classifying cancerous and non-cancerous
tumors, Customer churn prediction etc.
Dimensionality reduction: Most of the time, there is a lot of noise in the incoming data.
Machine learning algorithms use dimensionality reduction to remove this noise while distilling
the relevant information. Examples: Image compression, classify a database full of emails into
“not spam” and “spam”.
18/03/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 34
• The most widely used unsupervised algorithms are:
K-means clustering
PCA (Principal Component Analysis)
Association rule.