0% found this document useful (0 votes)
32 views28 pages

Feature and Feature Extractionlect2

Uploaded by

riya.munjal.ug21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views28 pages

Feature and Feature Extractionlect2

Uploaded by

riya.munjal.ug21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Feature and feature

extraction
What is a Feature?
• A feature is an individual measurable property within a recorded
dataset. In machine learning and statistics, features are often called
“variables” or “attributes.”
• Relevant features have a correlation or bearing on a model’s use case.
• Example:
• In a patient medical dataset, features could be age, gender, blood
pressure, cholesterol level, and other observed characteristics relevant
to the patient.
Feature extraction
• Feature extraction is a process in machine learning and
data analysis that involves identifying and extracting
relevant features from raw data. These features are
later used to create a more informative dataset, which
can be further utilized for various tasks such as:
• Classification
• Prediction
• Clustering
Advantages
• Feature extraction aims to reduce data complexity (often known as
“data dimensionality”) while retaining as much relevant information
as possible.

• This helps to improve the performance and efficiency of machine


learning algorithms and simplify the analysis process.

• Feature extraction may involve the creation of new features (“


feature engineering”) and data manipulation to separate and simplify
the use of meaningful features from irrelevant ones.
Why is Feature Extraction
Important?
• Feature extraction plays a vital role in many real-world
applications. Feature extraction is critical for processes
such as image and speech recognition, predictive
modeling, and Natural Language Processing (NLP).
• In these scenarios, the raw data may contain many
irrelevant or redundant features. This makes it difficult
for algorithms to accurately process the data.
• By performing feature extraction, the relevant features
are separated (“extracted”) from the irrelevant ones.
• With fewer features to process, the dataset becomes
simpler and the accuracy and efficiency of the analysis
improves.
Common Feature Types:
• Numerical: Values with numeric types (int, float, etc.). Examples:
age, salary, height.
• Categorical Features: Features that can take one of a limited
number of values. Examples: gender (male, female, X), color (red,
blue, green).
• Ordinal Features: Categorical features that have a clear
ordering. Examples: T-shirt size (S, M, L, XL).
• Binary Features: A special case of categorical features with only
two categories. Examples: is_smoker (yes, no), has_subscription
(true, false).
• Text Features: Features that contain textual data. Textual data
typically requires special preprocessing steps (like tokenization) to
transform it into a format suitable for machine learning models.
Feature Normalization
• Since data features can be measured on different scales,
it's often necessary to standardize or normalize them,
especially when using algorithms that are sensitive to
the magnitude and scale of variables (like gradient
descent-based algorithms, k-means clustering, or
support vector machines).

• Normalization standardizes the range of independent


variables or features of the data. This process can make
certain algorithms converge faster and lead to better
model performance, especially for algorithms sensitive to
the scale of input features.
Feature normalization helps in
the following ways:
• Scale Sensitivity: Features on larger scales can
disproportionately influence the outcome.
For example, if you have a dataset where one feature ranges from 1 to
1000 and another ranges from 0.1 to 1, the model might pay more
attention to the feature with the larger range.
• Better Performance: Normalization can lead to better
performance in many machine learning models by
ensuring that each feature contributes approximately
proportionate to the final decision. This is especially
meaningful for optimization algorithms, as they can
achieve convergence more quickly with normalized
features.
Common Feature Extraction
Techniques
• Autoencoders:

Autoencoders can identify key data features. The autoencoder


concept hinges on learning from the coding of the original data
sets to derive new, more potent features. It achieves this by
training a neural network to recreate its input, which forces it to
discover and exploit structures in the data. Through this process,
autoencoders reduce dimensionality and extract significant
features from the data, contributing to more effective machine-
learning models.
• Principal Component Analysis (PCA):
This feature extraction method reduces the dimensionality of
large data sets while preserving the maximum amount of
information. Principal Component Analysis emphasizes
variation and captures important patterns and relationships
between variables in the dataset.

• Bag of Words (BoW):

BoW is an effective technique in Natural Language Processing


(NLP) where the words (i.e. features) used in a text can be
extracted and classified by their usage frequency. A vector of
word counts represents each document. Machine learning
algorithms then use the word count as an input.
• Term Frequency-Inverse Document Frequency (TF-IDF):
An extension of BoW, TF-IDF is an NLP feature extraction
technique that uses a numerical statistic to reflect how
important a word is to a document in a collection or corpus.
Compared to BoW , it considers not only the frequency of a word
in a single document, but all other documents in the corpus. This
helps to adjust for the fact that some words appear more
frequently in general.
• Image Processing Techniques:
Image processing techniques involve raw data analysis to
identify and isolate significant characteristics or patterns in an
image. This could involve identifying edges and corners or
extracting features like color, texture, and shape. These features
can then be used for tasks such as image classification, object
detection, and image segmentation.
Learning
• Learning is a phenomenon through which a system
gets trained and becomes adaptable to give results in
an accurate manner.
• Learning is the most important phase as to how well the
system performs on the data provided to the system
depends on which algorithms are used on the data.
• The entire dataset is divided into two categories, one
which is used in training the model i.e. Training set, and
the other that is used in testing the model after training,
i.e. Testing set.
1. Supervised Learning
• In supervised learning, the model is trained on a labeled dataset,
which means that each training example is paired with an output
label. The goal is to learn a mapping from inputs to outputs so that
the model can predict the label for new, unseen data.
• Classification: The task of assigning input data to one of several
predefined categories. For example, handwriting recognition where
the input is an image of a handwritten character and the output is the
corresponding letter, red or blue.
• Regression: The task of predicting a continuous/real value. For
example, predicting the price of a house based on its features, weight.
• Supervised learning involves training a machine from
labeled data.
• Labeled data consists of examples with the correct
answer or classification.
• The machine learns the relationship between inputs
(fruit images) and outputs (fruit labels).
• The trained machine can then make predictions on
new, unlabeled data.
• For Regression
• Mean Squared Error (MSE): MSE measures the average
squared difference between the predicted values and the actual
values. Lower MSE values indicate better model performance.
• Root Mean Squared Error (RMSE): RMSE is the square root of
MSE, representing the standard deviation of the prediction
errors. Similar to MSE, lower RMSE values indicate better model
performance.
• Mean Absolute Error (MAE): MAE measures the average
absolute difference between the predicted values and the actual
values. It is less sensitive to outliers compared to MSE or RMSE.
• R-squared (Coefficient of Determination): R-squared
measures the proportion of the variance in the target variable
that is explained by the model. Higher R-squared values indicate
better model fit.
• For Classification
• Accuracy: Accuracy is the percentage of predictions that the model
makes correctly. It is calculated by dividing the number of correct
predictions by the total number of predictions.
• Precision: Precision is the percentage of positive predictions that the
model makes that are actually correct. It is calculated by dividing the
number of true positives by the total number of positive predictions.
• Recall: Recall is the percentage of all positive examples that the
model correctly identifies. It is calculated by dividing the number of
true positives by the total number of positive examples.
• F1 score: The F1 score is a weighted average of precision and
recall. It is calculated by taking the harmonic mean of precision and
recall.
• Confusion matrix: A confusion matrix is a table that shows the
number of predictions for each class, along with the actual class
labels. It can be used to visualize the performance of the model and
identify areas where the model is struggling.
Applications of Supervised
learning
• Spam filtering: Supervised learning algorithms can be trained to identify and
classify spam emails based on their content, helping users avoid unwanted
messages.
• Image classification: Supervised learning can automatically classify images
into different categories, such as animals, objects, or scenes, facilitating tasks
like image search, content moderation, and image-based product
recommendations.
• Medical diagnosis: Supervised learning can assist in medical diagnosis by
analyzing patient data, such as medical images, test results, and patient
history, to identify patterns that suggest specific diseases or conditions.
• Fraud detection: Supervised learning models can analyze financial
transactions and identify patterns that indicate fraudulent activity, helping
financial institutions prevent fraud and protect their customers.
• Natural language processing (NLP): Supervised learning plays a crucial role
in NLP tasks, including sentiment analysis, machine translation, and text
summarization, enabling machines to understand and process human language
effectively.
Advantages of Supervised
learning
• Supervised learning allows collecting data and produces
data output from previous experiences.
• Helps to optimize performance criteria with the help of
experience.
• Supervised machine learning helps to solve various
types of real-world computation problems.
• It performs classification and regression tasks.
• It allows estimating or mapping the result to a new
sample.
• We have complete control over choosing the number of
classes we want in the training data.
Disadvantages of Supervised
learning
• Classifying big data can be challenging.
• Training for supervised learning needs a lot of
computation time. So, it requires a lot of time.
• Supervised learning cannot handle all complex tasks in
Machine Learning.
• Computation time is vast for supervised learning.
• It requires a labelled data set.
• It requires a training process.
2. Unsupervised Learning
• In unsupervised learning, the model is trained on a dataset without labeled
responses. The goal is to infer the natural structure present within a set of
data points. The goal of unsupervised learning is to discover
patterns and relationships in the data without any explicit
guidance.
• Clustering: The task of grouping a set of objects in such a way that objects in
the same group (or cluster) are more similar to each other than to those in
other groups. Examples include K-means clustering and hierarchical
clustering., grouping customers by purchasing behavior.
• Dimensionality Reduction: The process of reducing the number of random
variables under consideration. Techniques include Principal Component
Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE).
• Association: people that buy X also tend to buy Y.
Application of Unsupervised
learning
• Anomaly detection: Unsupervised learning can identify unusual
patterns or deviations from normal behavior in data, enabling the
detection of fraud, intrusion, or system failures.
• Scientific discovery: Unsupervised learning can uncover hidden
relationships and patterns in scientific data, leading to new hypotheses
and insights in various scientific fields.
• Recommendation systems: Unsupervised learning can identify patterns
and similarities in user behavior and preferences to recommend
products, movies, or music that align with their interests.
• Customer segmentation: Unsupervised learning can identify groups of
customers with similar characteristics, allowing businesses to target
marketing campaigns and improve customer service more effectively.
• Image analysis: Unsupervised learning can group images based on their
content, facilitating tasks such as image classification, object detection,
and image retrieval.
• Advantages of Unsupervised learning
• It does not require training data to be labeled.
• Dimensionality reduction can be easily accomplished
using unsupervised learning.
• Capable of finding previously unknown patterns in data.
• Unsupervised learning can help you gain insights from
unlabeled data that you might not have been able to get
otherwise.
• Unsupervised learning is good at finding patterns and
relationships in data without being told what to look for.
This can help you learn new things about your data.
3. Semi-Supervised Learning
• This approach involves using both labeled and unlabeled data for
training. Typically, a small amount of labeled data and a large amount
of unlabeled data are used. This is useful when obtaining a fully
labeled dataset is expensive or time-consuming.
4. Reinforcement Learning
• Reinforcement learning is a type of machine learning where an agent
learns to make decisions by taking actions in an environment to
maximize cumulative reward. It is often used in scenarios where the
agent interacts with the environment in a sequential manner, such as
game playing or robotic control.
5. Feature Extraction and
Selection
• Effective pattern recognition depends on identifying the right features
that capture the underlying structure of the data. Feature extraction
involves transforming raw data into a set of features that can be
effectively used in modeling. Feature selection involves choosing the
most relevant features for the task at hand.
6.Deep Learning
• Deep learning is a subset of machine learning that involves neural
networks with many layers (deep neural networks). These models can
automatically learn hierarchical representations of data, making them
powerful for tasks involving large and complex datasets.

You might also like