0% found this document useful (0 votes)
37 views16 pages

Unit 1 AAM

Uploaded by

giramashish5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views16 pages

Unit 1 AAM

Uploaded by

giramashish5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Unit – 1

Advanced Algorithms in AI&ML (AAM)


Model Selection
In machine learning, the process of selecting the top model or algorithm from a list of potential
models to address a certain issue is referred to as model selection. It entails assessing and
contrasting various models according to how well they function and choosing the one that reaches
the highest level of accuracy or prediction power.
Because different models have varied levels of complexity, underlying assumptions, and
capabilities, model selection is a crucial stage in the machine-learning pipeline. Finding a model
that fits the training set of data well and generalizes well to new data is the objective. While a
model that is too complex may overfit the data and be unable to generalize, a model that is too
simple could underfit the data and do poorly in terms of prediction.
The following steps are frequently included in the model selection process:

 Problem formulation: Clearly express the issue at hand, including the kind of predictions
or task that you'd like the model to carry out (for example, classification, regression, or
clustering).
 Candidate model selection: Pick a group of models that are appropriate for the issue at
hand. These models can include straightforward methods like decision trees or linear
regression as well as more sophisticated ones like deep neural networks, random forests,
or support vector machines.
 Performance evaluation: Establish measures for measuring how well each model
performs. Common measurements include area under the receiver's operating characteristic
curve (AUC-ROC), recall, F1-score, mean squared error, and accuracy, precision, and
recall. The type of problem and the particular requirements will determine which metrics
are used.
 Training and evaluation: Each candidate model should be trained using a subset of the
available data (the training set), and its performance should be assessed using a different
subset (the validation set or via cross-validation). The established evaluation measures are
used to gauge the model's effectiveness.
 Model comparison: Evaluate the performance of various models and determine which one
performs best on the validation set. Take into account elements like data handling
capabilities, interpretability, computational difficulty, and accuracy.
 Hyperparameter tuning: Before training, many models require that certain
hyperparameters, such as the learning rate, regularisation strength, or the number of layers
that are hidden in a neural network, be configured. Use methods like grid search, random
search, and Bayesian optimization to identify these hyperparameters' ideal values.
 Final model selection: After the models have been analyzed and fine-tuned, pick the
model that performs the best. Then, this model can be used to make predictions based on
fresh, unforeseen data.
Model Selection in machine learning:
Model selection in machine learning is the process of selecting the best algorithm and model
architecture for a specific job or dataset. It entails assessing and contrasting various models to
identify the one that best fits the data & produces the best results. Model complexity, data handling
capabilities, and generalizability to new examples are all taken into account while choosing a
model. Models are evaluated and contrasted using methods like cross-validation, and grid search,
as well as indicators like accuracy and mean squared error. Finding a model that balances
complexity and performance to produce reliable predictions and strong generalization abilities is
the aim of model selection.
There are numerous important considerations to bear in mind while selecting a model for machine
learning. These factors assist in ensuring that the chosen model is effective in solving the issue at
its core and has an opportunity for outstanding performance. Here are some crucial things to
remember:
 The complexity of the issue: Determine how complex the issue you're trying to resolve
is. Simple models might effectively solve some issues, but more complicated models can
be necessary to fully represent complex relationships in the data. Take into account the size
of the dataset, the complexity of the input features, and any potential for non-linear
connections.
 Data Availability & Quality: Consider the accessibility and caliber of the data you
already have. Using complicated models with a lot of parameters on a limited dataset may
result in overfitting. Such situations may call for simpler models with fewer parameters.
Take into account missing data, outliers, and noise as well as how various models respond
to these difficulties.
 Interpretability: Consider whether the model's interpretability is crucial in your particular
setting. Some models, like decision trees or linear regression, offer interpretability by
giving precise insights into the correlations between the input data and the desired outcome.
Complex models, such as neural networks, may perform better but offer less
interpretability.
 Model Assumptions: Recognise the presumptions that various models make. For instance,
although decision trees assume piecewise constant relationships, linear regression assumes
a linear relationship between the input characteristics and the target variable. Make sure
the model you choose is consistent with the fundamental presumptions underpinning the
data and the issue.
 Scalability and Efficiency: If you're working with massive datasets or real-time
applications, take the model's scalability and computing efficiency into consideration.
Deep neural networks and support vector machines are two examples of models that could
need a lot of time and computing power to train.
 Regularisation and Generalisation: Assess the model's capacity to apply to fresh,
untested data. By including penalty terms to the objective function of the model,
regularisation approaches like L1 or L2 regularisation can help prevent overfitting. When
the training data is sparse, regularised models may perform better in terms of
generalization.
 Domain Expertise: Consider your expertise and domain knowledge. On the basis of
previous knowledge of the data or particular features of the domain, consider if particular
models are appropriate for the task. Models that are more likely to capture important
patterns can be found by using domain expertise to direct the selection process.
 Resource Constraints: Take into account any resource limitations you may have, such as
constrained memory space, processing speed, or time. Make that the chosen model can be
successfully implemented using the resources at hand. Some models require significant
resources during training or inference.
 Ensemble Methods: Examine the potential advantages of ensemble methods, which
integrate the results of various models in order to perform more effectively. By utilizing
the diversity of several models' predictions, ensemble approaches, such as bagging,
boosting, and stacking, frequently outperform individual models.
 Evaluation and Experimentation: experimentation and assessment of several models
should be done thoroughly. Utilize the right evaluation criteria and statistical tests to
compare their performance. To evaluate the models' performance on unknown data and
reduce the danger of overfitting, use hold-out or cross-validation.
Model Selection Techniques
Model selection in machine learning can be done using a variety of methods and tactics. These
methods assist in comparing and assessing many models to determine which is best suited to solve
a certain issue. Here are some methods for selecting models that are frequently used:
 Train-Test Split: With this strategy, the available data is divided into two sets: a training
set & a separate test set. The models are evaluated using a predetermined evaluation metric
on the test set after being trained on the training set. This method offers a quick and easy
way to evaluate a model's performance using hypothetical data.
 Cross-Validation: A resampling approach called cross-validation divides the data into
various groups or folds. Several folds are used as the test set & the rest folds as the training
set, and the models undergo training and evaluation on each fold separately. Lowering the
variance in the evaluation makes it easier to generate an accurate assessment of the model's
performance. Cross-validation techniques that are frequently used include leave-one-out,
stratified, and k-fold cross-validation.
 Grid Search: Hyperparameter tuning is done using the grid search technique. In order to
do this, a grid containing hyperparameter values must be defined, and all potential
hyperparameter combinations must be thoroughly searched. For each combination, the
models are trained, assessed, and their performances are contrasted. Finding the ideal
hyperparameter settings to optimize the model's performance is made easier by grid search.
 Random Search: A set distribution for hyperparameter values is sampled at random as
part of the random search hyperparameter tuning technique. In contrast to grid search,
which considers every potential combination, random search only investigates a portion of
the hyperparameter field. When a thorough search is not possible due to the size of the
search space, this strategy can be helpful.
 Bayesian optimization: A more sophisticated method of hyperparameter tweaking,
Bayesian optimization. It models the relationship between the performance of the model
and the hyperparameters using a probabilistic model. It intelligently chooses which set of
hyperparameters to investigate next by updating the probabilistic model and iteratively
assessing the model's performance. When the search space is big and expensive to examine,
Bayesian optimization is especially effective.
 Model averaging: This technique combines forecasts from various models to get a single
prediction. For regression issues, this can be accomplished by averaging the predictions,
while for classification problems, voting or weighted voting systems can be used. Model
averaging can increase overall prediction accuracy by lowering the bias and variation of
individual models.
 Information Criteria: Information criteria offer a numerical assessment of the trade-off
between model complexity and goodness of fit. Examples include the Akaike Information
Criterion (AIC) and the Bayesian Information Criterion (BIC). These criteria discourage
the use of too complicated models and encourage the adoption of simpler models that
adequately explain the data.
 Domain Expertise & Prior Knowledge: Prior understanding of the problem and the data,
as well as domain expertise, can have a significant impact on model choice. The models
that are more suitable given the specifics of the problem and the details of the data may be
known by subject matter experts.
 Model Performance Comparison: Using the right assessment measures, it is vital to
evaluate the performance of various models. Depending on the issue at hand, these
measurements could include F1-score, mean squared error, accuracy, precision, recall, or
the area beneath the receiver's operating characteristic curve (AUC-ROC). The best-
performing model can be found by comparing many models.

What is Feature Engineering?


Feature engineering is the process of transforming raw data into features that are suitable
for machine learning models. In other words, it is the process of selecting, extracting, and
transforming the most relevant features from the available data to build more accurate and
efficient machine learning models.
The success of machine learning models heavily depends on the quality of the features used to
train them. Feature engineering involves a set of techniques that enable us to create new
features by combining or transforming the existing ones. These techniques help to highlight the
most important patterns and relationships in the data, which in turn helps the machine learning
model to learn from the data more effectively.
What is a Feature?
In the context of machine learning, a feature (also known as a variable or attribute) is an
individual measurable property or characteristic of a data point that is used as input for a
machine learning algorithm. Features can be numerical, categorical, or text-based, and they
represent different aspects of the data that are relevant to the problem at hand.
 For example, in a dataset of housing prices, features could include the number of
bedrooms, the square footage, the location, and the age of the property. In a dataset of
customer demographics, features could include age, gender, income level, and occupation.
 The choice and quality of features are critical in machine learning, as they can greatly
impact the accuracy and performance of the model.
s
Need for Feature Engineering in Machine Learning?

We engineer features for various reasons, and some of the main reasons include:
 Improve User Experience: The primary reason we engineer features is to enhance the user
experience of a product or service. By adding new features, we can make the product more
intuitive, efficient, and user-friendly, which can increase user satisfaction and engagement.
 Competitive Advantage: Another reason we engineer features is to gain a competitive
advantage in the marketplace. By offering unique and innovative features, we can
differentiate our product from competitors and attract more customers.
 Meet Customer Needs: We engineer features to meet the evolving needs of customers. By
analyzing user feedback, market trends, and customer behavior, we can identify areas
where new features could enhance the product’s value and meet customer needs.
 Increase Revenue: Features can also be engineered to generate more revenue. For
example, a new feature that streamlines the checkout process can increase sales, or a
feature that provides additional functionality could lead to more upsells or cross-sells.
 Future-Proofing: Engineering features can also be done to future-proof a product or
service. By anticipating future trends and potential customer needs, we can develop
features that ensure the product remains relevant and useful in the long term.
Processes Involved in Feature Engineering
Feature engineering in Machine learning consists of mainly 5 processes: Feature Creation,
Feature Transformation, Feature Extraction, Feature Selection, and Feature Scaling. It is an
iterative process that requires experimentation and testing to find the best combination of
features for a given problem. The success of a machine learning model largely depends on the
quality of the features used in the model.
1. Feature Creation
Feature Creation is the process of generating new features based on domain knowledge or by
observing patterns in the data. It is a form of feature engineering that can significantly improve
the performance of a machine-learning model.
Types of Feature Creation:
1. Domain-Specific: Creating new features based on domain knowledge, such as creating
features based on business rules or industry standards.
2. Data-Driven: Creating new features by observing patterns in the data, such as calculating
aggregations or creating interaction features.
3. Synthetic: Generating new features by combining existing features or synthesizing new data
points.
Why Feature Creation?
1. Improves Model Performance: By providing additional and more relevant information to
the model, feature creation can increase the accuracy and precision of the model.
2. Increases Model Robustness: By adding additional features, the model can become more
robust to outliers and other anomalies.
3. Improves Model Interpretability: By creating new features, it can be easier to understand
the model’s predictions.
4. Increases Model Flexibility: By adding new features, the model can be made more flexible
to handle different types of data.
2. Feature Transformation
Feature Transformation is the process of transforming the features into a more suitable
representation for the machine learning model. This is done to ensure that the model can
effectively learn from the data.
Types of Feature Transformation:
1. Normalization: Rescaling the features to have a similar range, such as between 0 and 1, to
prevent some features from dominating others.
2. Scaling: Scaling is a technique used to transform numerical variables to have a similar
scale, so that they can be compared more easily. Rescaling the features to have a similar
scale, such as having a standard deviation of 1, to make sure the model considers all
features equally.
3. Encoding: Transforming categorical features into a numerical representation. Examples are
one-hot encoding and label encoding.
4. Transformation: Transforming the features using mathematical operations to change the
distribution or scale of the features. Examples are logarithmic, square root, and reciprocal
transformations.
Why Feature Transformation?
1. Improves Model Performance: By transforming the features into a more suitable
representation, the model can learn more meaningful patterns in the data.
2. Increases Model Robustness: Transforming the features can make the model more robust to
outliers and other anomalies.
3. Improves Computational Efficiency: The transformed features often require fewer
computational resources.
4. Improves Model Interpretability: By transforming the features, it can be easier to
understand the model’s predictions.
3. Feature Extraction
Feature Extraction is the process of creating new features from existing ones to provide more
relevant information to the machine learning model. This is done by transforming, combining,
or aggregating existing features.
Types of Feature Extraction:
1. Dimensionality Reduction: Reducing the number of features by transforming the data into a
lower-dimensional space while retaining important information. Examples are PCA and t-
SNE.
2. Feature Combination: Combining two or more existing features to create a new one. For
example, the interaction between two features.
3. Feature Aggregation: Aggregating features to create a new one. For example, calculating
the mean, sum, or count of a set of features.
4. Feature Transformation: Transforming existing features into a new representation. For
example, log transformation of a feature with a skewed distribution.
Why Feature Extraction?
1. Improves Model Performance: By creating new and more relevant features, the model can
learn more meaningful patterns in the data.
2. Reduces Overfitting: By reducing the dimensionality of the data, the model is less likely to
overfit the training data.
3. Improves Computational Efficiency: The transformed features often require fewer
computational resources.
4. Improves Model Interpretability: By creating new features, it can be easier to understand
the model’s predictions.
4. Feature Selection
Feature Selection is the process of selecting a subset of relevant features from the dataset to be
used in a machine-learning model. It is an important step in the feature engineering process as
it can have a significant impact on the model’s performance.
Types of Feature Selection:
1. Filter Method: Based on the statistical measure of the relationship between the feature and
the target variable. Features with a high correlation are selected.
2. Wrapper Method: Based on the evaluation of the feature subset using a specific machine
learning algorithm. The feature subset that results in the best performance is selected.
3. Embedded Method: Based on the feature selection as part of the training process of the
machine learning algorithm.
Why Feature Selection?
1. Reduces Overfitting: By using only the most relevant features, the model can generalize
better to new data.
2. Improves Model Performance: Selecting the right features can improve the accuracy,
precision, and recall of the model.
3. Decreases Computational Costs: A smaller number of features requires less computation
and storage resources.
4. Improves Interpretability: By reducing the number of features, it is easier to understand and
interpret the results of the model.
5. Feature Scaling
Feature Scaling is the process of transforming the features so that they have a similar scale.
This is important in machine learning because the scale of the features can affect the
performance of the model.
Types of Feature Scaling:
1. Min-Max Scaling: Rescaling the features to a specific range, such as between 0 and 1, by
subtracting the minimum value and dividing by the range.
2. Standard Scaling: Rescaling the features to have a mean of 0 and a standard deviation of 1
by subtracting the mean and dividing by the standard deviation.
3. Robust Scaling: Rescaling the features to be robust to outliers by dividing them by the
interquartile range.
Why Feature Scaling?
1. Improves Model Performance: By transforming the features to have a similar scale, the
model can learn from all features equally and avoid being dominated by a few large
features.
2. Increases Model Robustness: By transforming the features to be robust to outliers, the
model can become more robust to anomalies.
3. Improves Computational Efficiency: Many machine learning algorithms, such as k-nearest
neighbors, are sensitive to the scale of the features and perform better with scaled features.
4. Improves Model Interpretability: By transforming the features to have a similar scale, it can
be easier to understand the model’s predictions.
What are the Steps in Feature Engineering?
The steps for feature engineering vary per different Ml engineers and data scientists. Some of
the common steps that are involved in most machine-learning algorithms are:
1. Data Cleansing
 Data cleansing (also known as data cleaning or data scrubbing) involves identifying and
removing or correcting any errors or inconsistencies in the dataset. This step is
important to ensure that the data is accurate and reliable.
2. Data Transformation
3. Feature Extraction
4. Feature Selection
 Feature selection involves selecting the most relevant features from the dataset for use
in machine learning. This can include techniques like correlation analysis, mutual
information, and stepwise regression.
5. Feature Iteration
 Feature iteration involves refining and improving the features based on the performance
of the machine learning model. This can include techniques like adding new features,
removing redundant features and transforming features in different ways.

Techniques Used in Feature Engineering


Feature engineering is the process of transforming raw data into features that are suitable for
machine learning models. There are various techniques that can be used in feature engineering
to create new features by combining or transforming the existing ones. The following are some
of the commonly used feature engineering techniques:
One-Hot Encoding
One-hot encoding is a technique used to transform categorical variables into numerical values
that can be used by machine learning models. In this technique, each category is transformed
into a binary value indicating its presence or absence. For example, consider a categorical
variable “Colour” with three categories: Red, Green, and Blue. One-hot encoding would
transform this variable into three binary variables: Colour_Red, Colour_Green, and
Colour_Blue, where the value of each variable would be 1 if the corresponding category is
present and 0 otherwise.
Binning
Binning is a technique used to transform continuous variables into categorical variables. In this
technique, the range of values of the continuous variable is divided into several bins, and each
bin is assigned a categorical value. For example, consider a continuous variable “Age” with
values ranging from 18 to 80. Binning would divide this variable into several age groups such
as 18-25, 26-35, 36-50, and 51-80, and assign a categorical value to each age group.
Scaling
The most common scaling techniques are standardization and normalization. Standardization
scales the variable so that it has zero mean and unit variance. Normalization scales the variable
so that it has a range of values between 0 and 1.
Feature Split
Feature splitting is a powerful technique used in feature engineering to improve the
performance of machine learning models. It involves dividing single features into multiple sub-
features or groups based on specific criteria. This process unlocks valuable insights and
enhances the model’s ability to capture complex relationships and patterns within the data.
Text Data Preprocessing
Text data requires special preprocessing techniques before it can be used by machine learning
models. Text preprocessing involves removing stop words, stemming, lemmatization, and
vectorization. Stop words are common words that do not add much meaning to the text, such as
“the” and “and”. Stemming involves reducing words to their root form, such as converting
“running” to “run”. Lemmatization is similar to stemming, but it reduces words to their base
form, such as converting “running” to “run”. Vectorization involves transforming text data into
numerical vectors that can be used by machine learning models.

Feature Engineering – Numeric Data

Numeric data, fields, variables, or features typically represent data in the form of scalar
information that denotes an observation, recording, or measurement.

Of course, numeric data can also be represented as a vector of scalars where each specific entity
in the vector is a numeric data point in itself.
 Integers and floats are the most common and widely used numeric data types.

Numeric data is perhaps the easiest to process and is often used directly by Machine Learning
models. Even though numeric data can be directly fed into Machine Learning models, you would
still need to engineer features that are relevant to the scenario, problem, and domain before
building a model. Hence the need for feature engineering remains.

Values :

Usually, scalar values in its raw form indicate a specific measurement, metric, or observation
belonging to a specific variable or field.

 The semantics of this field is usually obtained from the field name itself or a data
dictionary if present.

Counts

Raw numeric measures can also indicate counts, frequencies and occurrences of specific
attributes.

Example : Number of hits to a web page, Views for a video, etc

Binarization :

Often raw numeric frequencies or counts are not necessary in building models especially with
regard to methods applied in building recommender engines.

 Merely using raw numeric frequencies or counts is not a good practice to use for
recommender engines.

Suppose our task is to build a recommender to recommend songs to users. One component of the
recommender might predict how much a user will enjoy a particular song.

In this case, the raw listen count is not a robust measure of user taste. Why ?? Users have
different listening habits. Some people might put their favorite songs on infinite loop, while
others might savor them only on special occasions. We can’t necessarily say that someone who
listens to a song 20 times must like it twice as much as someone else who listens to it 10 times.

In this case, a binary feature is preferred as opposed to a count based feature. A more robust
representation of user preference is to binarize the count and clip all counts greater than 1 to
1. In other words, if the user listened to a song at least once, then we count it as the user liking
the song.
Rounding :

Often when dealing with numeric attributes like proportions or percentages, we may not need
values with a high amount of precision.

Hence it makes sense to round off these high precision percentages into numeric integers. These
integers can then be directly used as raw numeric values or even as categorical (discrete class
based) features.

Binning :

Often when working with numeric data, you might come across features or attributes which
depict raw measures such as values or frequencies.

 In many cases, often the distributions of these attributes are skewed in the sense that some
sets of values will occur a lot and some will be very rare.
 Besides that, there is also the added problem of varying range of these values.

Suppose we are talking about song or video view counts. In some cases, the view counts will be
abnormally large and in some cases very small. Directly using these features in modeling might
cause issues. For example, if we are calculating similarity, A large count in one element of the
data vector would outweigh the similarity in all other elements, which could throw off the entire
similarity measurement for that feature vector.

Metrics like similarity measures, cluster distances, regression coefficients and more might get
adversely affected if we use raw numeric features having values which range
across multiple orders of magnitude.

One solution is to contain the scale by quantizing the count. In other words, we group the counts
into bins, and get rid of the actual count values. Quantization maps a continuous number to a
discrete one. We can think of the discretized numbers as an ordered sequence of bins that
represent a measure of intensity.

ach bin represents a specific degree of intensity and has a specific range of values which must
fall into that bin.

In order to quantize data, we have to decide how wide each bin should be. The solutions fall into
two categories: fixed-width or adaptive.

Fixed -Width Binning :

In fixed-width binning, as the name indicates, we have specific fixed widths for each of the bins,
which are usually pre-defined by the user analyzing the data.

 Each bin has a pre-fixed range of values which should be assigned to that bin on the basis
of some business or custom logic, rules, or necessary transformations.
Binning based on rounding is one of the ways, where you can use the rounding operation to bin
raw values.

Adaptive Binning :

So far, we have decided the bin width and ranges in fixed-width binning. But if there are
large gaps in the counts, then there will be many empty bins with no data. Hence, this technique
can lead to irregular bins that are not uniform based on the number of data points or values
which fall in each bin. Some of the bins might be densely populated and some of them might be
sparsely populated or even be empty!

This problem can be solved by adaptively positioning the bins based on the distribution of the
data. In this approach, we use the data distribution itself to decide what should be the
appropriate bins. This can be done using the quantiles of the distribution.

Quantile based binning is a good strategy to use for adaptive binning.

Quantiles are values that divide the data into equal portions. For example, the median divides the
data in halves; half the data points are smaller and half larger than the median.
The quartiles divide the data into quarters, the deciles into tenths, etc.

Thus, q-Quantiles help in partitioning a numeric attribute into q equal partitions.

pandas.DataFrame.quantile and pandas.Series.quantile compute the quantiles. pandas.qcut maps


data into a desired number of quantiles.

Let’s take a 4-Quantile or a quartile based adaptive binning scheme. The following snippet helps
us obtain the income values that fall on the four quartiles in the distribution.

Feature Scaling :

When dealing with numeric features, the values of these features can be bounded our
unbounded. Some features, such as latitude or longitude, are bounded in value. Other numeric
features, such as view counts of a video or web page hits, may increase without bound.

Using the raw values as input features might make models biased toward features having really
high magnitude values.

 Models that are smooth functions of the input, such as linear regression, logistic
regression, or anything that involves a matrix, are affected by the scale of the input.
 Tree-based models, on the other hand, couldn’t care less.

If your model is sensitive to the scale of input features, feature scaling could help. Even
otherwise, it is still recommended to normalize and scale down the features with feature
scaling, especially if you want to try out multiple Machine Learning algorithms on input features.
 As the name suggests, feature scaling changes the scale of the feature. Sometimes people
also call it feature normalization.
 Feature scaling is usually done individually to each feature.

Let us go over several types of common scaling operations, each resulting in a different
distribution of feature values.

Min-Max Scaling :

Let x be an individual feature value (i.e., a value of the feature in some data point), and min(x)
and max(x), respectively, be the minimum and maximum values of this feature over the entire
dataset.

Min-max scaling squeezes (or stretches) all feature values to be within the range of [0,
1]. However the MinMaxScaler class in scikit-learn also allows you to specify your own upper
and lower bound in the scaled value range using the feature_range variable.

In this scaling, we scale each value in the feature X by subtracting it from the minimum value in
the feature min (X) and dividing the resultant by the difference between the maximum and
minimum values in the feature max(X) – min (X).
In this scaling, we scale each value in the feature X by subtracting it from the minimum value in
the feature min (X) and dividing the resultant by the difference between the maximum and
minimum values in the feature max(X) – min (X).
Standardized Scaling :

The standard scaler tries to standardize each value in a feature column by removing the mean and
scaling the variance to be 1 from the values. This is also known as centering and scaling.

It subtracts off the mean of the feature (over all data points) and divides by the variance. Hence,
it can also be called variance scaling.

In this scaling, each value in feature X is subtracted by the mean μX and the resultant is divided
by the standard deviation σX. This is also popularly known as Z-score scaling. You can also
divide the resultant by the variance instead of the standard deviation if needed.

Robust Scaling :

The disadvantage of min-max scaling is that often the presence of outliers affects the scaled
values for any feature. Robust scaling tries to use specific statistical measures to scale features
without being affected by outliers.
where we scale each value of feature X by subtracting the median of X and dividing the resultant
by the IQR also known as the Inter-Quartile Range of X which is the range (difference)
between the first quartile (25th %ile) and the third quartile (75th %ile).

What is Feature Engineering for Categorical Attributes


In general, feature engineering manipulates and transforms data to extract relevant information to
predict the target variable. The transformations of feature engineering may involve changing the
data representation or applying statistical methods to create new attributes (a.k.a. features).
One of the most common feature engineering methods for categorical attributes is transforming
each categorical attribute into a numeric representation. Transforming categorical data into
numeric data is often called “categorical-column encoding” and creates a new numeric attribute
that is tractable for many ML algorithms. Categorical-column encoding allows data scientists to
quickly and flexibly incorporate categorical attribute information into their ML models.
Categorical-Column Encoding Methods
One-hot encoding
One-hot encoding is the simplest and most basic categorical-column encoding method. The idea
is to have a unique binary number of multiple digits for each category. Hence, the number of
digits is the number of categories for the categorical attribute to be encoded. The binary number
has one digit as 1 and the rest zeros, hence the name ‘one-hot.’
For example, if the category is male or ‘female,’ then ‘male’ can be represented by 10, and 01
can represent ‘female.’ For simplicity, we can ignore the right digit from these examples with
‘male’ represented by 1 and ‘female’ expressed as 0. The same idea applies to cases with more
than two categories. In the general case with N categories, the number of digits is N-1.
One-hot encoding treats each digit out of N-1 digits as a new column and allows removal of the
original categorical column. The main advantage of one-hot encoding is that it maps the
categorical column into multiple easy-to-use binary columns. Also, each new column
corresponds to a category in the original categorical column, making understanding the meaning
of one-hot encoding features straightforward.
Label Encoding
Label encoding is another well-known and straightforward method that maps the categorical
values into ordinal numbers starting from 0 to N-1. Each category value is assigned to a unique
integer -or non-integer- value within the chosen range. For example, (“bachelor’s degree,” “some
college,” “master’s degree,” “associate’s degree”) can be mapped to (0, 1, 2, 3).
Label encoding is straightforward to implement and solves the problem encountered in using
one-hot encoding with high cardinality columns. In addition, label encoding allows introducing
ordinal/taxonomy relations among categorical values into the feature.
Target Encoding
Unlike one-hot and label encoding, which are conducted only on the original categorical
attribute, target encoding finds the mapping from categorical values to numeric values using a
target variable. Target encoding assigns similar numeric values to two categorical values if their
relations with the target variable are similar (and vice versa). Thus, the target encoding features
represent the target values which often positively influences the model training process.
Feature Engineering From Text Data
NLP is a subfield of artificial intelligence where we understand human interaction with machines
using natural languages. To understand a natural language, you need to understand how we write
a sentence, how we express our thoughts using different words, signs, special characters, etc.
basically we should understand the context of the sentence to interpret its meaning.
Text themselves cannot be used by machine learning models. They expect their input to be
numeric. So we need some way that can transform input text into a numeric feature in a
meaningful way. There are several approaches for this and we’ll go over the beginners to
advanced feature engineering techniques for textual data.

Textual Feature Engineering


Following are some of the popular pre-processing techniques to pre-process, clean, and
normalize the text.
 Text tokenization and lower casing
 Removing special characters
 Contraction expansion
 Removing stopwords
 Correcting spellings
 Stemming
 Lemmatization

Number of words in text


Number of unique words in text
Number of characters in text (including whitespaces)
Number of stop words in text
Number of punctuations in text
Average word length in text
Number of paragraphs in a text
Number of contractions (can’t, won’t, don’t, haven’t, etc.) in text
Text Polarity
Text Subjectivity
Number of dialogues and narratives
Mean word length for dialogues and narratives
Count of part-of-speech (POS) tagging in text

You might also like