0% found this document useful (0 votes)

3 views17 pages

Unit-4 ML

The document discusses various dimensionality reduction techniques including Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), Factor Analysis, Independent Component Analysis (ICA), Locally Linear Embedding (LLE), and Isomap, along with their assumptions, steps, and applications. It also covers evolutionary learning methods, particularly Genetic Algorithms, which mimic natural selection to optimize solutions. Each technique is aimed at simplifying complex datasets while preserving essential information for analysis and interpretation.

Uploaded by

1432ultragamer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views17 pages

Unit-4 ML

Uploaded by

1432ultragamer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

UNIT-IV

Dimensionality Reduction: Linear Discriminant Analysis, Principal Component Analysis,

Factor Analysis, Independent Component Analysis, Locally Linear Embedding, Isomap,
Least Squares Optimization.
Evolutionary Learning: Genetic algorithms, Genetic Offspring, Genetic Operators, Using
Genetic Algorithms.
Dimensionality Reduction:
 Dimensionality reduction is the process of reducing the number of features (or
dimensions) in a dataset while retaining as much information as possible.
 In other words, it is a process of transforming high-dimensional data into a lower-
dimensional space that still preserves the essence of the original data.
 Dimensionality reduction can be done in two different ways:
o By only keeping the most relevant variables from the original dataset (this
technique is called feature selection)
o By finding a smaller set of new variables, each being a combination of the
input variables, containing the same information as the input variables (this
technique is called dimensionality reduction)

Linear Discriminant Analysis(LDA):

 Supervised dimension reduction technique.
 Takes class labels into account to find a feature combination that maximizes class
separation.
 Useful for classification tasks and finding discriminant features.
 LDA aims to find a linear combination of features that separates different classes.

1
Assumptions of LDA:

1. Multivariate Normality: Each class must follow a multivariate normal distribution

(multi-dimensional bell curve). You can asses this through visual plots or statistical
tests before applying LDA.
2. Homogeneity of Variances (Homoscedasticity): Ensuring uniform variance across
groups helps maintain the reliability of LDA's projections. Techniques like Levene's
test can assess this assumption.
3. Absence of Multicollinearity: LDA requires predictors to be relatively independent.
Techniques like variance inflation factors (VIFs) can diagnose multicollinearity
issues.

Steps:
1. Calculating mean vectors for each class.
2. Computing within-class and between-class scatter matrices to understand the
distribution and separation of classes.
3. Solving for the eigenvalues and eigenvectors that maximize the between-class
variance relative to the within-class variance. This defines the optimal projection
space to distinguish the classes.
Applications:
 Facial Recognition
 Medical Diagnostics
 Marketing-Customer Segmentation
Principal Component Analysis(PCA):
 Unsupervised dimension reduction technique.
 Principal Component Analysis (PCA) is a powerful technique used in data analysis,
particularly for reducing the dimensionality of datasets while preserving crucial
information.
 Transforming the original variables into a set of new, uncorrelated variables called
principal components.
 Reduces dimensions without considering class labels.
 Useful for data visualization and simplifying complex data.
 PCA identifies and uses the principal components (directions that maximize variance
and are orthogonal to each other) to effectively project data into a lower-dimensional
space.

2
Steps:
 Standardize the Data
 If the features of your dataset are on different scales, it’s essential to
standardize them (subtract the mean and divide by the standard
deviation).
 Compute the Covariance Matrix
 Calculate the covariance matrix for the standardized dataset.
 Compute Eigenvectors and Eigenvalues
 Find the eigenvectors and eigenvalues of the covariance matrix. The
eigenvectors represent the directions of maximum variance, and the
corresponding eigenvalues indicate the magnitude of variance along those
directions.
 Sort Eigenvectors by Eigenvalues
 Sort the eigenvectors based on their corresponding eigenvalues in descending
order.
 Choose Principal Components
 Select the top k eigenvectors (principal components) where k is the
desired dimensionality of the reduced dataset.
 Transform the Data
 Multiply the original standardized data by the selected principal
components to obtain the new, lower-dimensional representation of the data

Applications:
 Exploratory Data Analysis
 Predictive Modeling
 Image compression
 Genomics for pattern recognition
 Financial data for uncovering latent patterns and correlations
 Visualization of Complex datasets
Factor Analysis:
 Factor analysis is a statistical method used to describe variability among observed,
correlated variables in terms of a potentially lower number of unobserved variables
called factors.
 Used to identify underlying, unmeasured variables (factors) that explain the
variability across observed variables.

3
 Focuses on understanding latent structures in the data.
 Useful for revealing relationships and reducing dimensions based on these latent
factors.

Assumptions:

1. Linearity: The relationships between variables and factors are assumed to be linear.

2. Multivariate Normality: The variables in the dataset should follow a multivariate

normal distribution.

3. No Multicollinearity: Variables should not be highly correlated with each other, as

high multicollinearity can affect the stability and reliability of the factor analysis
results.

4. Adequate Sample Size: Factor analysis generally requires a sufficient sample size to
produce reliable results. The adequacy of the sample size can depend on factors such
as the complexity of the model and the ratio of variables to cases.

5. Homoscedasticity: The variance of the variables should be roughly equal across

different levels of the factors.

6. Uniqueness: Each variable should have unique variance that is not explained by the
factors. This assumption is particularly important in common factor analysis.

7. Independent Observations: The observations in the dataset should be independent of

each other.

8. Linearity of Factor Scores: The relationship between the observed variables and the
latent factors is assumed to be linear, even though the observed variables may not be
linearly related to each other.

9. Interval or Ratio Scale: Factor analysis typically assumes that the variables are
measured on interval or ratio scales, as opposed to nominal or ordinal scales.

4
Steps:
1. Determine the Suitability of Data for Factor Analysis

 Bartlett’s Test: Check the significance level to determine if the correlation

matrix is suitable for factor analysis.
 Kaiser-Meyer-Olkin (KMO) Measure: Verify the sampling adequacy. A value
greater than 0.6 is generally considered acceptable.

2. Choose the Extraction Method

 Principal Component Analysis (PCA): Used when the main goal is data
reduction.
 Principal Axis Factoring (PAF): Used when the main goal is to identify
underlying factors.

3. Factor Extraction
 Use the chosen extraction method to identify the initial factors.
 Extract eigenvalues to determine the number of factors to retain. Factors
with eigenvalues greater than 1 are typically retained in the analysis.
 Compute the initial factor loadings.
4. Determine the Number of Factors to Retain

 Scree Plot: Plot the eigenvalues in descending order to visualize the point
where the plot levels off (the “elbow”) to determine the number of factors to
retain.
 Eigenvalues: Retain factors with eigenvalues greater than 1.

5. Factor Rotation
 Orthogonal Rotation (Varimax, Quartimax): Assumes that the factors are
uncorrelated.
 Oblique Rotation (Promax, Oblimin): Allows the factors to be correlated.
 Rotate the factors to achieve a simpler and more interpretable factor structure.
 Examine the rotated factor loadings.
6. Interpret and Label the Factors
 Analyze the rotated factor loadings to interpret the underlying meaning of
each factor.
 Assign meaningful labels to each factor based on the variables with high
loadings on that factor.
7. Compute Factor Scores (if needed)
 Calculate the factor scores for each individual to represent their value on
each factor.
8. Report and Validate the Results
 Report the final factor structure, including factor loadings and communalities.
 Validate the results using additional data or by conducting a confirmatory factor
analysis if necessary.

5
Factor loading:

Correlation coefficient between the observed variable and the underlying factor is
called factor loading
Communalities:
 The proportion of each observed variable’s variance that can be explained by the
factors.
Applications:
 Dimensionality Reduction
 Identifying Latent Constructs
 Data Summarization
 Hypothesis Testing
 Variable Selection
 Improving Predictive Models

Independent Component Analysis(ICA):

 Independent Component Analysis (ICA) is based on information theory and is one of

the most widely used dimensionality reduction techniques.
 ICA looks for independent factors.
 If they are independent, they are not dependent on other variables.
 For example, a person’s age is independent of what that person eats or how much
television he/she watches.
 Independent Component Analysis (ICA) is a statistical and computational technique
used in machine learning to separate a multivariate signal into its independent non-
Gaussian components.
 The goal of ICA is to find a linear transformation of the data such that the transformed
data is as close to being statistically independent as possible.
 ICA excels in applications like the "cocktail party problem," where it isolates distinct
audio streams amid noise without prior source information.

Assumptions:

 The first assumption asserts that the source signals (original signals) are statistically
independent of each other.
 The second assumption is that each source signal exhibits non-Gaussian distributions.

6
Steps:

1. Centering adjusts the data to have a zero mean, ensuring that analyses focus on
variance rather than mean differences.
2. Whitening transforms the data into uncorrelated variables, simplifying the
subsequent separation process.
3. After these steps, ICA applies iterative methods to separate independent components,
and it often uses auxiliary methods like PCA or singular value decomposition (SVD)
to lower the number of dimensions at the start. This sets the stage for efficient and
robust component extraction.

Applications:

 In telecommunications, it enhances signal clarity amidst interference.

 Finance benefits from its ability to identify underlying factors in complex
market data, assess risk, and detect anomalies.
 In biomedical signal analysis, it dissects EEG or fMRI data to isolate neurological
activity from artifacts (such as eye blinks).

Locally Linear Embedding:

 LLE(Locally Linear Embedding) is an unsupervised approach designed to transform
data from its original high-dimensional space into a lower-dimensional representation,
all while striving to retain the essential geometric characteristics of the underlying
nonlinear feature structure.
Steps:
1. Neighborhood Selection: For each data point in the high-dimensional space, LLE
identifies its k-nearest neighbors. This step is crucial because LLE assumes that each
data point can be well approximated by a linear combination of its neighbors.
2. Weight Matrix Construction: LLE computes a set of weights for each data point to
express it as a linear combination of its neighbors. These weights are determined in
such a way that the reconstruction error is minimized. Linear regression is often used
to find these weights.
3. Global Structure Preservation: After constructing the weight matrix, LLE aims to
find a lower-dimensional representation of the data that best preserves the local linear
relationships. It does this by seeking a set of coordinates in the lower-dimensional
space for each data point that minimizes a cost function. This cost function evaluates
how well each data point can be represented by its neighbors.
4. Output Embedding: Once the optimization process is complete, LLE provides the
final lower-dimensional representation of the data. This representation captures the
essential structure of the data while reducing its dimensionality.
Advantages:
 Preservation of Local Structures: LLE is excellent at maintaining the in-data local
relationships or structures. It successfully captures the inherent geometry of nonlinear
manifolds by maintaining pairwise distances between nearby data points.

7
 Handling Non-Linearity: LLE has the ability to capture nonlinear patterns and
structures in the data, in contrast to linear techniques like Principal Component
Analysis (PCA). When working with complicated, curved, or twisted datasets, it is
especially helpful.
 Dimensionality Reduction: LLE lowers the dimensionality of the data while
preserving its fundamental properties. Particularly when working with high-
dimensional datasets, this reduction makes data presentation, exploration, and analysis
simpler.
Isomap:
 A nonlinear dimensionality reduction method used in data analysis and machine
learning is called isomap, short for isometric mapping.
 This technique works especially well for extracting the underlying structure from
large, complex datasets, like those from speech recognition, image analysis, and
biological systems.
Steps:
1. Calculate the pairwise distances: The algorithm starts by calculating the Euclidean
distances between the data points.
2. Find nearest neighbors according to these distances: For each data point, its k
nearest neighbor is determined by that distance.
3. Create a neighborhood plot: the edges of each point are aligned with their closest
neighbors, which creates a diagram that represents the data's regional structure.
4. Calculate geodesic distances: The Floyd algorithm sorts through all the pairs of data
points in a neighborhood graph and finds the most distant paths. geodesic distances
are represented by these shortest paths.
5. Perform dimensional reduction: Classical Multi Scaling MDS is used for geodesic
distance matrices that result in low dimensional embedding of data.
Advantages:

 Capturing non linear relationships: Unlike linear dimensional reduction techniques

such as PCA, Isomap is able to capture the underlying non linear structure of the data.
 Global structure: Isomap's goal is to preserve the overall relationship between data
points, which will give a better representation of the entire manifold.
 Globally optimised: The algorithm guarantees that on the built neighborhood graph,
where geodesic distances are defined, a global optimal solution will be found.

Applications:
 Visualization: High-dimensional data like face images can be visualized in a lower-
dimensional space, enabling easier exploration and understanding.
 Data exploration: Isomap can help identify clusters and patterns within the data that
are not readily apparent in the original high-dimensional space.
 Anomaly detection: Outliers that deviate significantly from the underlying manifold
can be identified using Isomap.

8
 Machine learning tasks: Isomap can be used as a pre-processing step for other
machine learning tasks, such as classification and clustering, by improving the
performance and interpretability of the models.
Least Squares Optimization:
 Least squares optimization is a mathematical technique that minimizes the sum of
squared residuals to find the best-fitting curve for a set of data points.
 It's a type of regression analysis that's often used by statisticians and traders to
identify trends and trading opportunities

Steps:

1. Determine the equation of the line you believe best fits the data.
 Denote the independent variable values as xi and the dependent ones as yi.
 Calculate the average values of xi and yi as X and Y.
 Presume the equation of the line of best fit as y = mx + c, where m is the slope of the
line and c represents the intercept of the line on the Y-axis.
 The slope m and intercept c can be calculated from the following formulas:

Thus, we obtain the line of best fit as y = mx + c.

2. Calculate the residuals (differences) between the observed values and the values
predicted by your model.
3. Square each of these residuals and sum them up.
4. Adjust the model to minimize this sum.

9
Evolutionary Learning:
 Evolutionary learning is a type of machine learning that applies evolutionary
algorithms to solve optimization problems.
 These algorithms mimic the process of natural selection, where candidate solutions to
a problem are randomly generated and evaluated based on a fitness function.
 The best solutions are then selected and combined to create new solutions, iteratively
improving over time until an optimal or near-optimal solution is found.
Genetic algorithm:
 The Genetic Algorithm is a computational approximation to how evolution performs
search, which is by producing modifications of the parent genomes in their offspring
and thus producing new individuals with different fitness.
 we need to model simple genetics inside a computer and solve problems using:
o a method for representing problems as chromosomes
o a way to calculate the fitness of a solution
o a selection method to choose parents
o a way to generate offspring by breeding the parents
 Example: Solving a Knapsack Problem using Genetic algorithm rather than greedy
approach.
String Representation:
 In genetic algorithms, string representation (also known as chromosome encoding) is
a method of representing potential solutions to an optimization problem.
 This representation is crucial because it impacts how genetic operators like crossover
and mutation are applied.
 Here are some common string representations used in genetic algorithms:

1. Binary Representation

 Description: Each individual is represented as a string of binary digits (0s and 1s).
 Example: 10101011
 Usage: Often used for problems where solutions can be naturally expressed as binary
numbers, such as combinatorial problems.

2. Integer Representation

 Description: Each individual is represented as a string of integers.

 Example: 5 3 8 2 7
 Usage: Useful for problems where solutions are sequences of integers, like scheduling
or routing problems.

3. Real-Valued Representation

 Description: Each individual is represented as a string of real numbers (floating-

point values).

10
 Example: 3.14 2.71 1.41 0.577
 Usage: Suitable for optimization problems in continuous search spaces, such as
function optimization.

4. Permutation Representation

 Description: Each individual is represented as a permutation of a sequence.

 Example: 4 1 3 2 5
 Usage: Ideal for problems where the order of elements matters, such as the traveling
salesman problem or job scheduling.

Evaluating Fitness:
 Evaluating fitness in genetic algorithms is a crucial step to determine how well each
individual (or solution) in the population solves the given problem.

Steps to Evaluate Fitness

1. Define the Fitness Function:

o Purpose: The fitness function is a mathematical formula or objective function
that quantifies how well a solution solves the given problem.
o Example: In the traveling salesman problem, the fitness function could be
the inverse of the total travel distance, where shorter distances result in higher
fitness scores.
2. Apply the Fitness Function:
o Calculation: Each individual in the population is evaluated using the fitness
function, resulting in a fitness score for each individual.
o Example: For a business optimization problem, each solution representing
a strategy is evaluated for its profitability.
3. Normalization (Optional):
o Purpose: Normalizing fitness values can help manage wide variations
and improve the selection process.
o Method: Fitness values can be scaled to a specific range, such as 0 to 1.
4. Selection for Reproduction:
o Purpose: Select the fittest individuals to become parents for the
next generation.
o Methods:
 Roulette Wheel Selection: Individuals are selected based on a
probability proportional to their fitness scores.
 Tournament Selection: Randomly select a subset of individuals and
choose the best among them.
 Rank Selection: Individuals are ranked based on their fitness,
and selection probabilities are assigned based on rank.

Population:
 Population refers to a collection of potential solutions to the problem.

11
Role of the Population:

 Diversity: A diverse population increases the likelihood of exploring different regions

of the solution space, helping avoid local optima and enhancing the chances of finding
a global optimum.
 Evolution: Over successive generations, the population evolves. Fitter individuals
are more likely to reproduce, passing their advantageous traits to offspring. This
evolutionary process helps improve the population's overall fitness.
 Selection: Individuals in the population are evaluated using a fitness function.
Those with higher fitness scores are selected for reproduction, ensuring that the next
generation is likely to inherit better traits.

Generating Offspring: Parent Selection:

 In genetic algorithms, the selection of parents to produce offspring is a critical step in
ensuring that the population evolves towards better solutions.
Parent Selection Methods:

1. Roulette Wheel Selection (Fitness Proportionate Selection)

 Mechanism: The probability of selecting an individual is proportional to its fitness.

Imagine a roulette wheel where each section is sized according to the fitness of
each individual.
 Advantages: Simple to implement and ensures that fitter individuals have a higher
chance of being selected.
 Disadvantages: Can be slow to converge if the fitness difference between
individuals is not significant.

2. Tournament Selection

 Mechanism: Randomly select a subset of individuals (a tournament) and choose

the fittest among them. The size of the tournament can vary.
 Advantages: Simple and effective, allows for fine-tuning by adjusting the tournament
size.
 Disadvantages: Larger tournaments can reduce genetic diversity.

3. Rank Selection

 Mechanism: Rank all individuals based on their fitness and assign selection
probabilities based on these ranks.
 Advantages: Reduces the risk of premature convergence by ensuring even low-fitness
individuals have a chance of being selected.
 Disadvantages: Requires sorting the population, which can add computational
overhead.

4. Stochastic Universal Sampling (SUS)

 Mechanism: Similar to roulette wheel selection but uses multiple selection points
to ensure a more even spread of individuals.

12
Advantages: Provides a more uniform selection of individuals.
 Disadvantages: Can be more complex to implement compared to roulette
wheel selection.

5. Truncation Selection

 Mechanism: Selects the top percentage of individuals based on fitness.

 Advantages: Simple and straightforward.
 Disadvantages: Can lead to a loss of genetic diversity and premature convergence.

Genetic Operators:
 Genetic operators are the mechanisms that drive the evolutionary process in genetic
algorithms.
 They modify the genetic composition of the population to create new individuals
(solutions) with the goal of improving overall fitness.
 Here are the primary genetic operators:

1. Selection

 Purpose: To choose individuals from the population to act as parents for the
next generation.
 Methods:
o Roulette Wheel Selection: Probability of selection is proportional to fitness.
o Tournament Selection: A subset of individuals compete, and the fittest
is selected.
o Rank Selection: Individuals are ranked, and selection is based on rank.

2. Crossover (Recombination)

 Purpose: To combine genetic material from two parents to produce new offspring.
This operator mimics biological reproduction and introduces new genetic
structures into the population.
 Types:
o Single-point Crossover: One crossover point is chosen, and segments are
swapped between the parents.
o Multi-point Crossover: Two crossover points are chosen, and segments
between them are swapped.
o Uniform Crossover: Each gene is independently chosen from one of
the parents with a specific probability.

13
3. Mutation

 Purpose: To introduce random genetic changes. Mutation maintains genetic diversity

within the population and prevents premature convergence.
 Types:
o Bit Flip Mutation: Inverts a bit in binary representation.
o Swap Mutation: Swaps two genes in the chromosome.
o Gaussian Mutation: Adds a small random value to genes in real-valued
representation.

Example:

 Before Mutation: 10101010

 Mutate 3rd Bit: The 3rd bit (1) is flipped to 0.

 Mutate 7th Bit: The 7th bit (1) is flipped to 0.

 After Mutation: 10001000

4. Replacement

 Purpose: To determine how new offspring replace individuals in the

current population. This ensures that the population evolves and adapts over
time.
 Strategies:
o Generational Replacement: Entire population is replaced by offspring.
o Steady-State Replacement: Only a few individuals are replaced at
each iteration.

Elitism:

 Elitism is a strategy used to ensure that the best individuals (those with the
highest fitness) from the current generation are carried over to the next generation
without any changes. This helps in preserving the best solutions found so far.
 Purpose: To avoid losing the best solutions due to the random nature of
selection, crossover, or mutation.
 Implementation: A fixed number of the top-performing individuals (elites) are
directly copied to the next generation.
 Benefit: Increases the chances of convergence to a global optimum by
preserving high-quality solutions.

Niching:

Niching techniques are used to maintain diversity in the population by forming

subpopulations (niches) around different peaks of the fitness landscape. This is especially
useful for multi-modal optimization problems where multiple optimal solutions exist.

 Methods:

14
o Fitness Sharing: Reduces the fitness of individuals that are close to
each other, effectively spreading the population across different niches.
o Crowding: Limits the replacement of similar individuals to preserve diversity.
o Clearing: Allocates a limited number of resources (reproductive
opportunities) to individuals in each niche.
 Benefits:
o Helps in exploring multiple solutions simultaneously.
o Prevents premature convergence to a single solution by maintaining diversity.
 Challenges: Implementing and tuning niching methods can be complex and
computationally intensive.

The Basic Genetic Algorithm:

1. Initialization:
o Begin with a randomly generated population of potential solutions,
called chromosomes or individuals.
2. Fitness Evaluation:
o Each individual is evaluated using a fitness function that measures how well it
solves the problem at hand.
3. Selection:
o Select individuals based on their fitness scores to be parents.
Common methods include roulette wheel selection and tournament
selection.
4. Crossover (Recombination):
o Combine pairs of parents to produce offspring. This mimics
biological reproduction and allows the mixing of genetic information.
5. Mutation:
o Introduce random changes to some individuals to maintain genetic
diversity and explore new solutions.
6. Replacement:
o Replace some or all of the population with new offspring, depending on
the specific algorithm.
7. Iteration:
o Repeat the evaluation, selection, crossover, mutation, and replacement steps
until a termination condition is met (e.g., a fixed number of generations or a
satisfactory fitness level).

Using Genetic Algorithms:

Map Colouring:
 Map coloring is a classic problem in combinatorial optimization, where the goal is to
color the regions of a map using a limited number of colors such that no two adjacent
regions have the same color.
Punctuated Equilibrium:
 Punctuated Equilibrium is a concept borrowed from evolutionary biology, where long
periods of stability (equilibrium) in species are interrupted by short, rapid bursts of
significant change (punctuation).

15
 When applied to genetic algorithms, this concept suggests that the algorithm may
experience long periods of little improvement followed by sudden leaps in
performance.

Knapsack Problem:
 The knapsack problem is a classic optimization challenge where you aim to maximize
the total value of items packed into a knapsack without exceeding its weight capacity.
Four Peaks Problem:
 The Four Peaks Problem is a well-known benchmark problem used to evaluate the
performance of optimization algorithms, including genetic algorithms.
 The problem involves finding the global optima in a landscape with two local optima
and two global optima
Limitations of GA:

1. Computational Complexity:

 High Computational Cost: GAs can be computationally expensive, especially

for large populations and complex problems.
 Slow Convergence: They can take a long time to find an optimal or near-optimal
solution due to the iterative nature of the process.

2. Premature Convergence:

 Local Optima: GAs may converge prematurely to local optima rather than finding
the global optimum. This happens when the population lacks diversity and
becomes similar too early.
 Genetic Drift: Random sampling errors can lead to a loss of genetic diversity
over generations, reducing the algorithm's effectiveness.

3. Parameter Sensitivity:

 Tuning Required: The performance of GAs is sensitive to the choice of parameters

such as population size, crossover rate, and mutation rate. Finding the right balance
can be challenging and often requires experimentation.

16
 No Universal Settings: Different problems may require different parameter
settings, so there is no one-size-fits-all solution.

4. Representation Issues:

 Encoding Problems: The choice of representation for solutions (binary, integer, real-
valued) can significantly impact the performance and complexity of the algorithm.
 Complex Solutions: Some problems might have complex solution spaces that are
difficult to represent effectively with simple genetic encodings.

5. Fitness Function Challenges:

 Fitness Calculation: Evaluating the fitness function can be time-

consuming, especially for problems that require extensive computation.
 Design Complexity: Designing an effective fitness function that accurately
reflects the problem's objectives can be difficult.

6. Scalability:

 Large-Scale Problems: GAs may struggle with very large-scale problems due to the
exponential increase in the search space.
 Parallel Processing Needed: Efficiently handling large populations often
requires parallel processing capabilities, which can be complex to implement.

7. Randomness:

 Stochastic Nature: GAs rely heavily on random processes, which can lead to
inconsistent performance. Two runs with the same initial conditions may
produce different results.
 Unpredictability: The stochastic nature means that sometimes the algorithm might
not find a good solution within a reasonable time frame.

*****

Unit-4 ML
No ratings yet
Unit-4 ML
19 pages
Unit 4
No ratings yet
Unit 4
17 pages
ML Unit-4
No ratings yet
ML Unit-4
17 pages
Dr. Chinmoy Jana Iiswbm: Management House, Kolkata
No ratings yet
Dr. Chinmoy Jana Iiswbm: Management House, Kolkata
22 pages
Unit 4
No ratings yet
Unit 4
33 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
It ML Unit 4 Notes Final
No ratings yet
It ML Unit 4 Notes Final
21 pages
Factor Analysis
No ratings yet
Factor Analysis
4 pages
Factor Analysis
No ratings yet
Factor Analysis
26 pages
Factor Analysis
No ratings yet
Factor Analysis
9 pages
ADA Chapter5
No ratings yet
ADA Chapter5
6 pages
Unit V - Research Application in Business - Factor Analysis
No ratings yet
Unit V - Research Application in Business - Factor Analysis
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
2 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
48 pages
Research Notes
No ratings yet
Research Notes
21 pages
PREDICTIVE BUSINESS ANALYTICS Sem 4
No ratings yet
PREDICTIVE BUSINESS ANALYTICS Sem 4
31 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
Data Analytics
No ratings yet
Data Analytics
28 pages
Dimensionality Reduction-PCA FA LDA
No ratings yet
Dimensionality Reduction-PCA FA LDA
12 pages
Types of Factor Analysis
No ratings yet
Types of Factor Analysis
7 pages
PCA & Factor Analysis: Presented by Deepak Sharma
No ratings yet
PCA & Factor Analysis: Presented by Deepak Sharma
11 pages
Factor Handout
No ratings yet
Factor Handout
20 pages
Factor Analysis (FA)
No ratings yet
Factor Analysis (FA)
61 pages
RES805-RM-Module 2
No ratings yet
RES805-RM-Module 2
26 pages
MiM Predictive Analytics Sessions 1 2 (PCA)
No ratings yet
MiM Predictive Analytics Sessions 1 2 (PCA)
26 pages
Lecture 4 - Notes On Principal Components Analysis and Factor Analysis1
No ratings yet
Lecture 4 - Notes On Principal Components Analysis and Factor Analysis1
3 pages
Sessions 21-24 Factor Analysis - Ppt-Rev
No ratings yet
Sessions 21-24 Factor Analysis - Ppt-Rev
61 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Factor Analysis
No ratings yet
Factor Analysis
11 pages
2.6 Factor Analysis
No ratings yet
2.6 Factor Analysis
35 pages
Unit 3: Discriminant Analysis and Cluster Analysis
No ratings yet
Unit 3: Discriminant Analysis and Cluster Analysis
43 pages
Multidimensional Scaling (MDS)
No ratings yet
Multidimensional Scaling (MDS)
18 pages
Factor Analysis
No ratings yet
Factor Analysis
8 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Lecture-10 Factor Analysis - Reduced & Modified James McNeill Set W Consent
No ratings yet
Lecture-10 Factor Analysis - Reduced & Modified James McNeill Set W Consent
55 pages
Principal Components Analysis
100% (1)
Principal Components Analysis
24 pages
Unit - 4
No ratings yet
Unit - 4
82 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Factor Analysis and Principal Components: by A. Subrahmanyam
No ratings yet
Factor Analysis and Principal Components: by A. Subrahmanyam
14 pages
JML Regression
No ratings yet
JML Regression
36 pages
Business Research Method: Factor Analysis
100% (1)
Business Research Method: Factor Analysis
52 pages
PR - Unit 4
No ratings yet
PR - Unit 4
15 pages
Factors
No ratings yet
Factors
68 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
Factor Analysis Full
No ratings yet
Factor Analysis Full
61 pages
Unit 4
No ratings yet
Unit 4
79 pages
Feature Selection and Dimensionality Reduction
No ratings yet
Feature Selection and Dimensionality Reduction
4 pages
Class 10 Factor Analysis I
No ratings yet
Class 10 Factor Analysis I
45 pages
PCA and LDA Assignment
No ratings yet
PCA and LDA Assignment
5 pages
Factor Analysis
No ratings yet
Factor Analysis
54 pages
Exploratory Factror Analysis
No ratings yet
Exploratory Factror Analysis
39 pages
Exploratory Factor Analysis
No ratings yet
Exploratory Factor Analysis
19 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Unit-5 ML
100% (1)
Unit-5 ML
14 pages
Voice-Based Mail System For Visually Impaired
No ratings yet
Voice-Based Mail System For Visually Impaired
1 page
KRR Unit-IV
No ratings yet
KRR Unit-IV
117 pages
Data Analytics III I
No ratings yet
Data Analytics III I
86 pages
Unit - 4
No ratings yet
Unit - 4
64 pages
Devops UNIT 4
No ratings yet
Devops UNIT 4
12 pages
UFED 5.0 ReleaseNotes Unblock Phones Cellebrite
100% (1)
UFED 5.0 ReleaseNotes Unblock Phones Cellebrite
13 pages
Icons: Cisco Products
No ratings yet
Icons: Cisco Products
13 pages
Coding Interview Preparation Guide
No ratings yet
Coding Interview Preparation Guide
8 pages
Invoke Portscan - ps1
No ratings yet
Invoke Portscan - ps1
22 pages
Manual de Servicio - Accutorr-3
No ratings yet
Manual de Servicio - Accutorr-3
66 pages
8770 R5.0 - Product Presentation Dec 2021
No ratings yet
8770 R5.0 - Product Presentation Dec 2021
21 pages
Edge AI
88% (8)
Edge AI
156 pages
Lec 9
No ratings yet
Lec 9
19 pages
XII Maths-1
No ratings yet
XII Maths-1
6 pages
EXCELLENT-Appian Lead Developer Resume Nashville, TN - Hire IT People
No ratings yet
EXCELLENT-Appian Lead Developer Resume Nashville, TN - Hire IT People
5 pages
Large Language Model Agent
No ratings yet
Large Language Model Agent
9 pages
Hiuhi
No ratings yet
Hiuhi
40 pages
F3DA04-6R F3DA08-5R一般规格（11年第1版）
No ratings yet
F3DA04-6R F3DA08-5R一般规格（11年第1版）
6 pages
Axpert MAXII 8K TWIN Off Grid Manual 20220121
No ratings yet
Axpert MAXII 8K TWIN Off Grid Manual 20220121
81 pages
Understanding Domain Name and Webhosting
No ratings yet
Understanding Domain Name and Webhosting
3 pages
Network Programming With Go Cheat Sheet
No ratings yet
Network Programming With Go Cheat Sheet
1 page
How To Search Batches On The Basis of A Remaining Shelf Life in Batch Determination - SAP Blogs PDF
No ratings yet
How To Search Batches On The Basis of A Remaining Shelf Life in Batch Determination - SAP Blogs PDF
3 pages
Make Your Life Sparkle With Biometric: Innovations
No ratings yet
Make Your Life Sparkle With Biometric: Innovations
3 pages
Course Structure CSE 2022
No ratings yet
Course Structure CSE 2022
2 pages
Doc-IT Features and Benefits
No ratings yet
Doc-IT Features and Benefits
8 pages
Sequential ISAR Images Classification Using CNN-Bi-LSTM Method
No ratings yet
Sequential ISAR Images Classification Using CNN-Bi-LSTM Method
5 pages
Module 3 - Explore Networks With Packet Tracer - 5. Explore Networks With Packet Tracer Summary
No ratings yet
Module 3 - Explore Networks With Packet Tracer - 5. Explore Networks With Packet Tracer Summary
15 pages
Sysmex CA-660 操作手册
No ratings yet
Sysmex CA-660 操作手册
2 pages
QP Iot System Development
No ratings yet
QP Iot System Development
3 pages
AI Question Bank
No ratings yet
AI Question Bank
3 pages
Block Cioher
No ratings yet
Block Cioher
38 pages
Canon I350 Waste Tank Full - Fixyourownprinter
No ratings yet
Canon I350 Waste Tank Full - Fixyourownprinter
22 pages
Maths 11class - Functions
No ratings yet
Maths 11class - Functions
45 pages
Ais C1
No ratings yet
Ais C1
13 pages
Wild Mage: A Class For Neverwinter Nights 2
No ratings yet
Wild Mage: A Class For Neverwinter Nights 2
44 pages

Unit-4 ML

Uploaded by

Unit-4 ML

Uploaded by

UNIT-IV

Dimensionality Reduction: Linear Discriminant Analysis, Principal Component Analysis,

Linear Discriminant Analysis(LDA):

1. Multivariate Normality: Each class must follow a multivariate normal distribution

2. Multivariate Normality: The variables in the dataset should follow a multivariate

3. No Multicollinearity: Variables should not be highly correlated with each other, as

5. Homoscedasticity: The variance of the variables should be roughly equal across

7. Independent Observations: The observations in the dataset should be independent of

 Bartlett’s Test: Check the significance level to determine if the correlation

2. Choose the Extraction Method

Independent Component Analysis(ICA):

 Independent Component Analysis (ICA) is based on information theory and is one of

 In telecommunications, it enhances signal clarity amidst interference.

Locally Linear Embedding:

 Capturing non linear relationships: Unlike linear dimensional reduction techniques

Thus, we obtain the line of best fit as y = mx + c.

 Description: Each individual is represented as a string of integers.

 Description: Each individual is represented as a string of real numbers (floating-

 Description: Each individual is represented as a permutation of a sequence.

Steps to Evaluate Fitness

1. Define the Fitness Function:

 Diversity: A diverse population increases the likelihood of exploring different regions

Generating Offspring: Parent Selection:

1. Roulette Wheel Selection (Fitness Proportionate Selection)

 Mechanism: The probability of selecting an individual is proportional to its fitness.

 Mechanism: Randomly select a subset of individuals (a tournament) and choose

4. Stochastic Universal Sampling (SUS)

 Mechanism: Selects the top percentage of individuals based on fitness.

 Purpose: To introduce random genetic changes. Mutation maintains genetic diversity

 Before Mutation: 10101010

 Mutate 3rd Bit: The 3rd bit (1) is flipped to 0.

 Mutate 7th Bit: The 7th bit (1) is flipped to 0.

 After Mutation: 10001000

 Purpose: To determine how new offspring replace individuals in the

Niching techniques are used to maintain diversity in the population by forming

The Basic Genetic Algorithm:

Using Genetic Algorithms:

 High Computational Cost: GAs can be computationally expensive, especially

 Tuning Required: The performance of GAs is sensitive to the choice of parameters

5. Fitness Function Challenges:

 Fitness Calculation: Evaluating the fitness function can be time-

You might also like