Machine Learning: Dr. Jagan. T Professor Department of ECE, GRIET
Machine Learning: Dr. Jagan. T Professor Department of ECE, GRIET
Unit-I
Dr. Jagan. T
Professor
Department of ECE, GRIET
Introduction to Machine Learning
• Machine learning is a subfield of artificial intelligence (AI) that focuses on enabling
computers to learn from data without being explicitly programmed. It involves algorithms
and statistical models that allow computers to improve their performance on a specific task
over time, based on the data they are exposed to.
Key Concepts in Machine Learning:
• Algorithms: These are sets of rules and procedures that guide the learning process.
Different algorithms are suited for different types of data and tasks.
• Data: Machine learning models require data to learn from. This data can be labeled (for
supervised learning) or unlabeled (for unsupervised learning).
• Training: The process of feeding data to the algorithm to enable it to learn patterns and
relationships.
• Model: The output of the training process, representing the learned patterns and used for
making predictions or decisions on new data.
Types of Machine Learning
• Supervised Learning: The algorithm learns from labeled data, where the correct
output is provided. Examples include classification (e.g., spam detection) and
regression (e.g., predicting house prices).
• Reinforcement Learning: The algorithm learns through trial and error, receiving
rewards or penalties for its actions. It's often used in robotics and game playing.
Applications of Machine Learning:
• Bias: Machine learning models can inherit biases present in the data.
• Deep Learning: Using artificial neural networks with multiple layers to learn
complex patterns.
Feature Engineering:
•A set of boxes representing different feature engineering techniques, like binning,
scaling, creating interaction features, or applying domain-specific transformations.
Feature Selection:
•A box indicating the process of choosing the most relevant features for the model.
Data Transformation:
•Boxes showing operations like one-hot encoding, normalization, or log
transformation.
Feature Storage:
•A box representing where the final engineered features are stored, like a data
warehouse or a dedicated feature store.
Why Bother with Feature Engineering?
•Improved Model Performance: Better features lead to more accurate and robust models.
•Faster Training: Relevant features can reduce training time.
•Enhanced Interpretability: Engineered features can make models easier to understand.
•Handles Missing Data: Feature engineering can address missing values effectively.
•Addresses Outliers: Techniques can mitigate the impact of outliers.
•Image: A graph showing improved model performance with feature engineering.
Types of Feature Engineering Techniques
• (Brief explanations and examples for each):
• Data Cleaning: Handling missing values, outliers, and inconsistencies. (e.g.,
imputation, outlier removal)
• Feature Scaling: Normalization and standardization. (e.g., Min-Max scaling, Z-
score normalization)
• Encoding Categorical Variables: One-hot encoding, label encoding. (e.g.,
converting colors to numerical representations)
• Creating New Features: Polynomial features, interaction features, domain-specific
features. (e.g., creating BMI from height and weight)
• Feature Transformation: Log transformation, square root transformation. (e.g.,
handling skewed data)
• Feature Selection: Identifying the most relevant features. (e.g., filter methods,
wrapper methods, embedded methods)
• Image: Icons representing each technique.
Data Cleaning
• We can see that we are just changing mean and standard deviation to a
standard normal distribution which is still normal thus the shape of the
distribution is not affected.
Encoding Categorical Variables
• One-Hot Encoding: Creates binary
columns for each category.
• Label Encoding: Assigns a unique integer
to each category.
Creating New Features
By taking the logarithm of a data set, large values are compressed, effectively reducing the
impact of extreme outliers and making the distribution appear more symmetrical. This is
particularly useful when data is skewed towards the right, with a few very large values.
This transformation has a milder effect on the data compared to log transformation, making
it suitable for situations where the skewness is not as extreme. It can also be applied to data
•Wrapper Methods:
•Pros: Can potentially find optimal feature subsets for a specific model.
•Cons: Computationally expensive, can overfit to the chosen model.
•Examples: Forward selection, backward elimination, recursive feature elimination
(RFE).
Feature Selection
• Data Visualization: It's often easier to visualize data in 2D or 3D. Dimensionality reduction
can help project high-dimensional data into a lower-dimensional space for visualization.
Dimensionality:
Common Techniques:
• Principal Component Analysis (PCA): A linear technique that finds
the directions of maximum variance in the data and projects the data
onto those directions.
• Linear Discriminant Analysis (LDA): A technique that finds the
linear combinations of features that best separate different classes.
Dimensionality:
• Its primary goal is to reduce the dimensionality of the data while preserving as much variance as
possible.
• PCA is an unsupervised algorithm that creates linear combinations of the original features, known as
principal components.
• These components are calculated such that the first one captures the maximum variance in the dataset,
while each subsequent component explains the remaining variance without being correlated with the
previous ones.
Dimensionality:
3.Data Compression: Represents data with fewer components, reducing storage needs
and speeding up processing.
4.Outlier Detection: Identifies unusual data points by showing which ones deviate
significantly in the reduced space.
Dimensionality:
Disadvantages of Principal Component Analysis
1.Interpretation Challenges: The new components are combinations of original
variables, which can be hard to explain.
2.Data Scaling Sensitivity: Requires proper scaling of data before application, or results
may be misleading.
3.Information Loss: Reducing dimensions may lose some important information if too
few components are kept.
4.Assumption of Linearity: Works best when relationships between variables are linear,
and may struggle with non-linear data.
6.Risk of Overfitting: Using too many components or working with a small dataset
might lead to models that don’t generalize well.
PCA: Example
Let’s understand it’s working in simple terms:
Imagine you’re looking at a messy cloud of data points (like stars in the sky) and want to
simplify it. PCA helps you find the “most important angles” to view this cloud so you don’t
miss the big patterns. Here’s how it works, step by step:
•The direction of X remains unchanged (hence, eigenvectors define “stable directions” of A).
•and therefore corresponding eigenvector can be found using the equation AX=λX.
PCA: Example
•Keep only the top 2–3 directions (or enough to capture ~95% of the variance).
•Project the data onto these directions to get a simplified, lower-dimensional version.
PCA is an unsupervised learning algorithm, meaning it doesn’t require prior knowledge of
target variables.
It’s commonly used in exploratory data analysis and machine learning to simplify datasets
without losing critical information.
We know everything sound complicated, let’s understand again with help of visual image
where, x-axis (Radius) and y-axis (Area) represent two original features in the dataset.
PCA: Example
PCA: Example
•The data points (blue dots) are projected onto PC₁, effectively reducing
• the dataset from two dimensions (Radius, Area) to one dimension (PC₁).
•This transformation simplifies the dataset while retaining most of the original variability.
The image visually explains why PCA selects the direction with the highest variance
(PC₁).
By removing PC₂,
we reduce redundancy while keeping essential information.
The transformation helps in data compression, visualization, and improved model
performance.
Linear Discriminant Analysis in Machine Learning
One such technique is Linear Discriminant Analysis (LDA) which helps in reducing
the dimensionality of data while retaining the most significant features for
classification tasks.
It works by finding the linear combinations of features that best separate the classes
in the dataset. In this article we will learn about it and how to implement it in
python.
Linear Discriminant Analysis in Machine Learning
For example, when data points belonging to two classes are plotted if they are not
linearly separable LDA will attempt to find a projection that maximizes class
separability.
Linear Discriminant Analysis in Machine Learning
Image shows an example where the classes (black and green circles) are not linearly
It uses both axes (X and Y) to generate a new axis in such a way that it
maximizes the distance between the means of the two classes while minimizing
This transforms the dataset into a space where the classes are better separated.
Linear Discriminant Analysis in Machine Learning
After transforming the data points along a new axis LDA maximizes the class
separation. This new axis allows for clearer classification by projecting the data
along a line that enhances the distance between the means of the two classes.
Linear Discriminant Analysis in Machine Learning
• Perpendicular distance between the decision boundary and the data points helps
us to visualize how LDA works by reducing class variation and increasing
separability.
• After generating this new axis using the above-mentioned criteria, all the data
points of the classes are plotted on this new axis and are shown in the figure
given below.
Linear Discriminant Analysis in Machine Learning
• It shows how LDA creates a new axis to project the data and separate the two
classes effectively along a linear path.
• But it fails when the mean of the distributions are shared as it becomes
impossible for LDA to find a new axis that makes both classes linearly separable.
In such cases we use non-linear discriminant analysis.
Linear Discriminant Analysis in Machine Learning
Linear Discriminant Analysis in Machine Learning
Linear Discriminant Analysis in Machine Learning
Advantages of LDA
•Simple and computationally efficient.
•Works well even when the number of features is much larger than the number of training
samples.
•Can handle multicollinearity.
Disadvantages of LDA
•Assumes Gaussian distribution of data which may not always be the case.
•Assumes equal covariance matrices for different classes which may not hold in all datasets.
Applications of LDA
Linear Discriminant Analysis (LDA) is a technique for dimensionality reduction that not
only simplifies high-dimensional data but also enhances the performance of models by
maximizing class separability.