Machine Learning
Machine Learning
Ans.. Feature:
For example:
Feature Engineering:
Feature engineering is the process of transforming raw data into features that
better represent the problem, leading to improved model performance. This step
is crucial in machine learning as it influences the quality of the predictions made
by the model. It involves selecting, transforming, and creating new features to
enhance the learning process.
1. Feature transformation: Modifying the original features into a new set that
is more useful for the model. This can include:
1|Page
o Feature extraction: Deriving new features from the original ones
using some mapping or transformation.
2. Feature subset selection: Selecting a subset of features from the full set
that are most important for the model, without generating new features.
2|Page
Simplified Data Structure: Feature transformation can simplify the
structure of the data, making it easier for the machine learning algorithm to
work with, thus reducing the training time and improving interpretability.
1. Subset Generation:
This step generates potential subsets of features from the full set. Strategies
for generating subsets include:
o Forward Selection: Starts with an empty set and adds features one
by one based on their importance.
2. Subset Evaluation:
Each subset is evaluated to determine its usefulness. Evaluation can be
done through:
4|Page
o Filter Methods: Evaluate features based on statistical properties like
correlation or variance, independently of any learning algorithm.
3. Stopping Criteria:
The process stops when a predefined condition is met. Common stopping
criteria include:
4. Subset Validation:
Once a subset is selected, it's validated to ensure its performance on
unseen data using techniques like cross-validation or testing on real-world
datasets.
1. Filter Methods:
Select features based on statistical properties without involving any learning
algorithm. Examples include correlation, mutual information, and chi-
square tests.
2. Wrapper Methods:
Evaluate subsets by training a model and measuring its performance for
each subset. This method is more computationally intensive but often
provides better results than filter methods.
5|Page
3. Embedded Methods:
These perform feature selection during the model training process.
Algorithms like Lasso (L1 regularization) or decision trees automatically
select features based on their importance during training.
4. Hybrid Methods:
Combine the benefits of both filter and wrapper approaches. A hybrid
approach first selects a smaller set of features using filter methods and then
uses wrapper methods to refine the final subset.
4o
OR
Feature subset selection is the process of choosing the most relevant features for
a machine learning model while removing irrelevant or redundant ones. This
6|Page
improves model performance, reduces overfitting, and decreases computational
cost.
4. Subset Validation: Ensuring the selected subset performs well on new data
through cross-validation or testing on real-world data.
This process improves the model's accuracy, simplifies the model, and makes it
more interpretable(Chapter-4).
7|Page
Ans…Methods of Feature Subset Selection
1. Filter Methods
Techniques:
Advantages:
Disadvantages:
2. Wrapper Methods
8|Page
Wrapper methods evaluate feature subsets by training a model and using its
performance to guide the feature selection. These methods are more accurate but
computationally expensive.
Techniques:
o Backward Elimination: Starts with all features and removes the least
important ones.
Advantages:
Disadvantages:
3. Embedded Methods
Embedded methods perform feature selection during the model training process
itself, making them more efficient than wrapper methods. They are often part of
machine learning algorithms.
Techniques:
9|Page
o Decision Trees: Automatically select important features based on
their ability to split the data.
Advantages:
Disadvantages:
4. Hybrid Methods
Hybrid methods combine the benefits of filter and wrapper methods. They
typically use a filter method to reduce the feature set first and then refine the
selection with a wrapper or embedded method.
Techniques:
Advantages:
o Useful for large datasets where wrapper methods alone would be too
slow.
Disadvantages:
10 | P a g e
o Still more computationally expensive than pure filter methods.
Conclusion:
Each feature subset selection method has its pros and cons. Filter methods are
suitable for large datasets with minimal computational resources, while wrapper
methods are more accurate but slower. Embedded methods provide a balance
between efficiency and performance, while hybrid methods combine different
approaches to get the best results. The choice of method depends on factors such
as dataset size, model complexity, and the available computational power
(Chapter-4)(Chapter-5).
OR
1. Filter Methods:
2. Wrapper Methods:
11 | P a g e
o Techniques include Recursive Feature Elimination (RFE), forward
selection, and backward elimination.
3. Embedded Methods:
4. Hybrid Methods:
Each method has its use depending on the dataset size, model complexity, and
computational resources(Chapter-4)(Chapter-5).
Ans….
12 | P a g e
Q.6…Explain the methods of feature extractions in details.
Ans… Feature extraction is a crucial process in machine learning and data analysis,
aimed at transforming raw data into a set of features that effectively represent the
underlying patterns in the data. Here are some common methods of feature
extraction explained in detail:
How it Works:
o It computes the covariance matrix of the data and then calculates the
eigenvalues and eigenvectors.
Applications:
Advantages:
13 | P a g e
Disadvantages:
o PCA assumes linear relationships and may not perform well with non-
linear data.
How it Works:
o LDA computes the mean and scatter of each class and then
determines the linear combinations of features that best separate the
classes.
Applications:
Advantages:
o Works well with small datasets and when the classes are well-
separated.
Disadvantages:
14 | P a g e
o Assumes that features are normally distributed and have the same
covariance matrix for all classes.
How it Works:
Applications:
Advantages:
o Effective for separating sources in cases where PCA may not work.
Disadvantages:
15 | P a g e
4. t-Distributed Stochastic Neighbor Embedding (t-SNE)
How it Works:
Applications:
Advantages:
Disadvantages:
5. Autoencoders
How it Works:
16 | P a g e
o An autoencoder consists of an encoder that compresses the input
into a lower-dimensional latent space and a decoder that
reconstructs the input from this representation.
Applications:
Advantages:
Disadvantages:
How it Works:
o Collisions may occur when different features are hashed to the same
index, leading to some information loss.
Applications:
17 | P a g e
o Widely used in text classification and natural language processing,
especially with large vocabulary sizes.
Advantages:
Disadvantages:
Summary:
18 | P a g e
Issues in High-Dimensional Data:
1. Overfitting:
3. Sparsity:
4. Irrelevant Features:
5. Diminishing Distance:
6. Visualization Difficulties:
19 | P a g e
1. Dimensionality Reduction:
4. Handling Sparsity:
6. Visualization:
20 | P a g e
o Dimensionality reduction techniques like PCA and t-SNE make it
possible to visualize high-dimensional data in 2D or 3D spaces,
facilitating better understanding and interpretation of data patterns.
Summary:
High-dimensional data can lead to several challenges in machine learning and data
analysis. Here are some of the key issues associated with high-dimensional
datasets:
1. Overfitting:
o Models may fit the noise in the training data rather than the
underlying pattern, leading to poor generalization to unseen data.
3. Sparsity:
21 | P a g e
o As dimensionality increases, data points become more sparse,
making it difficult to find patterns and relationships within the data.
5. Diminishing Distance:
6. Visualization Difficulties:
Feature reduction techniques help mitigate the problems associated with high-
dimensional data by simplifying the dataset while retaining its essential
information. Here’s how feature reduction can address these issues:
1. Reducing Overfitting:
6. Facilitating Visualization:
Conclusion
23 | P a g e
Q.9….What is dimensionality reduction. Explain PCA in details.
Ans…Dimensionality Reduction
Steps in PCA:
1. Standardization:
24 | P a g e
o Sort eigenvalues in descending order and select the top kkk
eigenvectors corresponding to the largest eigenvalues.
o Project the original data onto the new feature space using the
selected eigenvectors to obtain the reduced dataset.
Advantages of PCA:
Disadvantages of PCA:
Assumes linear relationships, which may not be valid for all datasets.
Conclusion
Unit 5
Ans… Definitions:
25 | P a g e
a. Random Variables:
Continuous: Takes on any value within a range (e.g., the height of people).
b. Probability:
Probability is the measure of the likelihood that a specific event will occur. It
ranges from 0 (impossible event) to 1 (certain event). The probability of event AAA
is denoted as P(A)P(A)P(A): 0≤P(A)≤10 \leq P(A) \leq 10≤P(A)≤1
c. Conditional Probability:
where P(A∩B)P(A \cap B)P(A∩B) is the probability of both events AAA and BBB
occurring.
d. Discrete Distributions:
e. Continuous Distributions:
26 | P a g e
probabilities are expressed as areas under a probability density function (PDF).
Examples include the Normal distribution and Exponential distribution.
f. Sampling:
g. Testing:
h. Hypothesis:
27 | P a g e
Sample Space (S): The set of all possible outcomes of an experiment. For
example, the sample space for a coin toss is S={H,T}S = \{H, T\}S={H,T}.
Probability of an Event (P(A)): The likelihood that a specific event will occur.
It is defined as the ratio of favorable outcomes to the total number of
outcomes, where 0≤P(A)≤10 \leq P(A) \leq 10≤P(A)≤1.
1. Handling Uncertainty:
2. Modeling Randomness:
3. Probabilistic Predictions:
28 | P a g e
4. Bayesian Inference:
5. Performance Metrics:
Conclusion
29 | P a g e
Ans…
30 | P a g e
Q.4… What is difference between Discrete distributions and Continuous
distributions.
Ans…
31 | P a g e
Q.5.. Write a note on Central limit theorem.
Ans..
32 | P a g e
Q.6… Explain Monte Carlo Approximation
33 | P a g e
solve problems that may be deterministic in nature but are difficult or impossible
to solve analytically.
1. Random Sampling:
o The core idea behind Monte Carlo methods is to use random samples
to represent the space of possible outcomes. By randomly generating
inputs or scenarios, we can simulate the behavior of complex
systems.
1. Applications:
2. Benefits:
3. Limitations:
Conclusion
35 | P a g e