Unit-4 ML
Unit-4 ML
1
Assumptions of LDA:
Steps:
1. Calculating mean vectors for each class.
2. Computing within-class and between-class scatter matrices to understand the
distribution and separation of classes.
3. Solving for the eigenvalues and eigenvectors that maximize the between-class
variance relative to the within-class variance. This defines the optimal projection
space to distinguish the classes.
Applications:
Facial Recognition
Medical Diagnostics
Marketing-Customer Segmentation
Principal Component Analysis(PCA):
Unsupervised dimension reduction technique.
Principal Component Analysis (PCA) is a powerful technique used in data analysis,
particularly for reducing the dimensionality of datasets while preserving crucial
information.
Transforming the original variables into a set of new, uncorrelated variables called
principal components.
Reduces dimensions without considering class labels.
Useful for data visualization and simplifying complex data.
PCA identifies and uses the principal components (directions that maximize variance
and are orthogonal to each other) to effectively project data into a lower-dimensional
space.
2
Steps:
Standardize the Data
If the features of your dataset are on different scales, it’s essential to
standardize them (subtract the mean and divide by the standard
deviation).
Compute the Covariance Matrix
Calculate the covariance matrix for the standardized dataset.
Compute Eigenvectors and Eigenvalues
Find the eigenvectors and eigenvalues of the covariance matrix. The
eigenvectors represent the directions of maximum variance, and the
corresponding eigenvalues indicate the magnitude of variance along those
directions.
Sort Eigenvectors by Eigenvalues
Sort the eigenvectors based on their corresponding eigenvalues in descending
order.
Choose Principal Components
Select the top k eigenvectors (principal components) where k is the
desired dimensionality of the reduced dataset.
Transform the Data
Multiply the original standardized data by the selected principal
components to obtain the new, lower-dimensional representation of the data
Applications:
Exploratory Data Analysis
Predictive Modeling
Image compression
Genomics for pattern recognition
Financial data for uncovering latent patterns and correlations
Visualization of Complex datasets
Factor Analysis:
Factor analysis is a statistical method used to describe variability among observed,
correlated variables in terms of a potentially lower number of unobserved variables
called factors.
Used to identify underlying, unmeasured variables (factors) that explain the
variability across observed variables.
3
Focuses on understanding latent structures in the data.
Useful for revealing relationships and reducing dimensions based on these latent
factors.
Assumptions:
1. Linearity: The relationships between variables and factors are assumed to be linear.
4. Adequate Sample Size: Factor analysis generally requires a sufficient sample size to
produce reliable results. The adequacy of the sample size can depend on factors such
as the complexity of the model and the ratio of variables to cases.
6. Uniqueness: Each variable should have unique variance that is not explained by the
factors. This assumption is particularly important in common factor analysis.
8. Linearity of Factor Scores: The relationship between the observed variables and the
latent factors is assumed to be linear, even though the observed variables may not be
linearly related to each other.
9. Interval or Ratio Scale: Factor analysis typically assumes that the variables are
measured on interval or ratio scales, as opposed to nominal or ordinal scales.
4
Steps:
1. Determine the Suitability of Data for Factor Analysis
Principal Component Analysis (PCA): Used when the main goal is data
reduction.
Principal Axis Factoring (PAF): Used when the main goal is to identify
underlying factors.
3. Factor Extraction
Use the chosen extraction method to identify the initial factors.
Extract eigenvalues to determine the number of factors to retain. Factors
with eigenvalues greater than 1 are typically retained in the analysis.
Compute the initial factor loadings.
4. Determine the Number of Factors to Retain
Scree Plot: Plot the eigenvalues in descending order to visualize the point
where the plot levels off (the “elbow”) to determine the number of factors to
retain.
Eigenvalues: Retain factors with eigenvalues greater than 1.
5. Factor Rotation
Orthogonal Rotation (Varimax, Quartimax): Assumes that the factors are
uncorrelated.
Oblique Rotation (Promax, Oblimin): Allows the factors to be correlated.
Rotate the factors to achieve a simpler and more interpretable factor structure.
Examine the rotated factor loadings.
6. Interpret and Label the Factors
Analyze the rotated factor loadings to interpret the underlying meaning of
each factor.
Assign meaningful labels to each factor based on the variables with high
loadings on that factor.
7. Compute Factor Scores (if needed)
Calculate the factor scores for each individual to represent their value on
each factor.
8. Report and Validate the Results
Report the final factor structure, including factor loadings and communalities.
Validate the results using additional data or by conducting a confirmatory factor
analysis if necessary.
5
Factor loading:
Correlation coefficient between the observed variable and the underlying factor is
called factor loading
Communalities:
The proportion of each observed variable’s variance that can be explained by the
factors.
Applications:
Dimensionality Reduction
Identifying Latent Constructs
Data Summarization
Hypothesis Testing
Variable Selection
Improving Predictive Models
Assumptions:
The first assumption asserts that the source signals (original signals) are statistically
independent of each other.
The second assumption is that each source signal exhibits non-Gaussian distributions.
6
Steps:
1. Centering adjusts the data to have a zero mean, ensuring that analyses focus on
variance rather than mean differences.
2. Whitening transforms the data into uncorrelated variables, simplifying the
subsequent separation process.
3. After these steps, ICA applies iterative methods to separate independent components,
and it often uses auxiliary methods like PCA or singular value decomposition (SVD)
to lower the number of dimensions at the start. This sets the stage for efficient and
robust component extraction.
Applications:
7
Handling Non-Linearity: LLE has the ability to capture nonlinear patterns and
structures in the data, in contrast to linear techniques like Principal Component
Analysis (PCA). When working with complicated, curved, or twisted datasets, it is
especially helpful.
Dimensionality Reduction: LLE lowers the dimensionality of the data while
preserving its fundamental properties. Particularly when working with high-
dimensional datasets, this reduction makes data presentation, exploration, and analysis
simpler.
Isomap:
A nonlinear dimensionality reduction method used in data analysis and machine
learning is called isomap, short for isometric mapping.
This technique works especially well for extracting the underlying structure from
large, complex datasets, like those from speech recognition, image analysis, and
biological systems.
Steps:
1. Calculate the pairwise distances: The algorithm starts by calculating the Euclidean
distances between the data points.
2. Find nearest neighbors according to these distances: For each data point, its k
nearest neighbor is determined by that distance.
3. Create a neighborhood plot: the edges of each point are aligned with their closest
neighbors, which creates a diagram that represents the data's regional structure.
4. Calculate geodesic distances: The Floyd algorithm sorts through all the pairs of data
points in a neighborhood graph and finds the most distant paths. geodesic distances
are represented by these shortest paths.
5. Perform dimensional reduction: Classical Multi Scaling MDS is used for geodesic
distance matrices that result in low dimensional embedding of data.
Advantages:
Applications:
Visualization: High-dimensional data like face images can be visualized in a lower-
dimensional space, enabling easier exploration and understanding.
Data exploration: Isomap can help identify clusters and patterns within the data that
are not readily apparent in the original high-dimensional space.
Anomaly detection: Outliers that deviate significantly from the underlying manifold
can be identified using Isomap.
8
Machine learning tasks: Isomap can be used as a pre-processing step for other
machine learning tasks, such as classification and clustering, by improving the
performance and interpretability of the models.
Least Squares Optimization:
Least squares optimization is a mathematical technique that minimizes the sum of
squared residuals to find the best-fitting curve for a set of data points.
It's a type of regression analysis that's often used by statisticians and traders to
identify trends and trading opportunities
Steps:
1. Determine the equation of the line you believe best fits the data.
Denote the independent variable values as xi and the dependent ones as yi.
Calculate the average values of xi and yi as X and Y.
Presume the equation of the line of best fit as y = mx + c, where m is the slope of the
line and c represents the intercept of the line on the Y-axis.
The slope m and intercept c can be calculated from the following formulas:
2. Calculate the residuals (differences) between the observed values and the values
predicted by your model.
3. Square each of these residuals and sum them up.
4. Adjust the model to minimize this sum.
9
Evolutionary Learning:
Evolutionary learning is a type of machine learning that applies evolutionary
algorithms to solve optimization problems.
These algorithms mimic the process of natural selection, where candidate solutions to
a problem are randomly generated and evaluated based on a fitness function.
The best solutions are then selected and combined to create new solutions, iteratively
improving over time until an optimal or near-optimal solution is found.
Genetic algorithm:
The Genetic Algorithm is a computational approximation to how evolution performs
search, which is by producing modifications of the parent genomes in their offspring
and thus producing new individuals with different fitness.
we need to model simple genetics inside a computer and solve problems using:
o a method for representing problems as chromosomes
o a way to calculate the fitness of a solution
o a selection method to choose parents
o a way to generate offspring by breeding the parents
Example: Solving a Knapsack Problem using Genetic algorithm rather than greedy
approach.
String Representation:
In genetic algorithms, string representation (also known as chromosome encoding) is
a method of representing potential solutions to an optimization problem.
This representation is crucial because it impacts how genetic operators like crossover
and mutation are applied.
Here are some common string representations used in genetic algorithms:
1. Binary Representation
Description: Each individual is represented as a string of binary digits (0s and 1s).
Example: 10101011
Usage: Often used for problems where solutions can be naturally expressed as binary
numbers, such as combinatorial problems.
2. Integer Representation
3. Real-Valued Representation
10
Example: 3.14 2.71 1.41 0.577
Usage: Suitable for optimization problems in continuous search spaces, such as
function optimization.
4. Permutation Representation
Evaluating Fitness:
Evaluating fitness in genetic algorithms is a crucial step to determine how well each
individual (or solution) in the population solves the given problem.
Population:
Population refers to a collection of potential solutions to the problem.
11
Role of the Population:
2. Tournament Selection
3. Rank Selection
Mechanism: Rank all individuals based on their fitness and assign selection
probabilities based on these ranks.
Advantages: Reduces the risk of premature convergence by ensuring even low-fitness
individuals have a chance of being selected.
Disadvantages: Requires sorting the population, which can add computational
overhead.
Mechanism: Similar to roulette wheel selection but uses multiple selection points
to ensure a more even spread of individuals.
12
Advantages: Provides a more uniform selection of individuals.
Disadvantages: Can be more complex to implement compared to roulette
wheel selection.
5. Truncation Selection
Genetic Operators:
Genetic operators are the mechanisms that drive the evolutionary process in genetic
algorithms.
They modify the genetic composition of the population to create new individuals
(solutions) with the goal of improving overall fitness.
Here are the primary genetic operators:
1. Selection
Purpose: To choose individuals from the population to act as parents for the
next generation.
Methods:
o Roulette Wheel Selection: Probability of selection is proportional to fitness.
o Tournament Selection: A subset of individuals compete, and the fittest
is selected.
o Rank Selection: Individuals are ranked, and selection is based on rank.
2. Crossover (Recombination)
Purpose: To combine genetic material from two parents to produce new offspring.
This operator mimics biological reproduction and introduces new genetic
structures into the population.
Types:
o Single-point Crossover: One crossover point is chosen, and segments are
swapped between the parents.
o Multi-point Crossover: Two crossover points are chosen, and segments
between them are swapped.
o Uniform Crossover: Each gene is independently chosen from one of
the parents with a specific probability.
13
3. Mutation
Example:
4. Replacement
Elitism:
Elitism is a strategy used to ensure that the best individuals (those with the
highest fitness) from the current generation are carried over to the next generation
without any changes. This helps in preserving the best solutions found so far.
Purpose: To avoid losing the best solutions due to the random nature of
selection, crossover, or mutation.
Implementation: A fixed number of the top-performing individuals (elites) are
directly copied to the next generation.
Benefit: Increases the chances of convergence to a global optimum by
preserving high-quality solutions.
Niching:
Methods:
14
o Fitness Sharing: Reduces the fitness of individuals that are close to
each other, effectively spreading the population across different niches.
o Crowding: Limits the replacement of similar individuals to preserve diversity.
o Clearing: Allocates a limited number of resources (reproductive
opportunities) to individuals in each niche.
Benefits:
o Helps in exploring multiple solutions simultaneously.
o Prevents premature convergence to a single solution by maintaining diversity.
Challenges: Implementing and tuning niching methods can be complex and
computationally intensive.
1. Initialization:
o Begin with a randomly generated population of potential solutions,
called chromosomes or individuals.
2. Fitness Evaluation:
o Each individual is evaluated using a fitness function that measures how well it
solves the problem at hand.
3. Selection:
o Select individuals based on their fitness scores to be parents.
Common methods include roulette wheel selection and tournament
selection.
4. Crossover (Recombination):
o Combine pairs of parents to produce offspring. This mimics
biological reproduction and allows the mixing of genetic information.
5. Mutation:
o Introduce random changes to some individuals to maintain genetic
diversity and explore new solutions.
6. Replacement:
o Replace some or all of the population with new offspring, depending on
the specific algorithm.
7. Iteration:
o Repeat the evaluation, selection, crossover, mutation, and replacement steps
until a termination condition is met (e.g., a fixed number of generations or a
satisfactory fitness level).
15
When applied to genetic algorithms, this concept suggests that the algorithm may
experience long periods of little improvement followed by sudden leaps in
performance.
Knapsack Problem:
The knapsack problem is a classic optimization challenge where you aim to maximize
the total value of items packed into a knapsack without exceeding its weight capacity.
Four Peaks Problem:
The Four Peaks Problem is a well-known benchmark problem used to evaluate the
performance of optimization algorithms, including genetic algorithms.
The problem involves finding the global optima in a landscape with two local optima
and two global optima
Limitations of GA:
1. Computational Complexity:
2. Premature Convergence:
Local Optima: GAs may converge prematurely to local optima rather than finding
the global optimum. This happens when the population lacks diversity and
becomes similar too early.
Genetic Drift: Random sampling errors can lead to a loss of genetic diversity
over generations, reducing the algorithm's effectiveness.
3. Parameter Sensitivity:
16
No Universal Settings: Different problems may require different parameter
settings, so there is no one-size-fits-all solution.
4. Representation Issues:
Encoding Problems: The choice of representation for solutions (binary, integer, real-
valued) can significantly impact the performance and complexity of the algorithm.
Complex Solutions: Some problems might have complex solution spaces that are
difficult to represent effectively with simple genetic encodings.
6. Scalability:
Large-Scale Problems: GAs may struggle with very large-scale problems due to the
exponential increase in the search space.
Parallel Processing Needed: Efficiently handling large populations often
requires parallel processing capabilities, which can be complex to implement.
7. Randomness:
Stochastic Nature: GAs rely heavily on random processes, which can lead to
inconsistent performance. Two runs with the same initial conditions may
produce different results.
Unpredictability: The stochastic nature means that sometimes the algorithm might
not find a good solution within a reasonable time frame.
*****
17