ML Notes
ML Notes
Feature extraction:
Feature extraction in machine learning is the process of transforming raw data into a set of
meaningful features that can be used by a machine learning model to make predictions or
decisions.
Raw data (like images, text, audio, or sensor data) is often complex and unstructured. Machine
learning models work better when the input data is represented in a numerical and structured
form. Feature extraction helps by:
● PCA is a technique used to reduce the number of features (dimensions) in your dataset
while keeping as much important information (variance) as possible.
● Finds new axes (called principal components) that are linear combinations of the original
features.
● These axes are chosen so that:
o The first one captures the most variance in the data.
o The second one captures the most variance left (orthogonal to the first).
SVD is a mathematical technique from linear algebra. It factorizes any matrix into three smaller matrices:
Where:
X=UΣVT
Feature selection
It is a key part of machine learning and data preprocessing. It helps improve model performance, reduce
overfitting, and decrease training time. There are two main components to it: Feature Ranking and
Subset Selection.
1. Feature Ranking
Techniques:
● Filter Methods:
o Based on statistics like:
▪ Correlation Coefficient
▪ Chi-Squared Test
▪ ANOVA F-test
● Wrapper Methods:
o Use a predictive model to score feature subsets.
o Example: Recursive Feature Elimination (RFE)
● Embedded Methods:
o Feature importance comes from the model itself.
o Examples:
▪ Lasso Regression (L1 regularization)
▪ Tree-based models (Random Forest, XGBoost)
Filter methods in feature selection are techniques that select the most relevant
features from a dataset before training any machine learning model. They rely
purely on statistical properties of the data, not on any learning algorithm. The
key idea is that filter methods evaluate each feature individually based on its relationship
with the target variable (label), and rank or select the best ones.
Advantages:
Wrapper methods in feature selection are techniques that select features by actually training
a model on different subsets of features and evaluating performance. Instead of relying on just
statistical tests (like filter methods), wrappers use the model's performance as a guide to choose
the best features.
Key Idea:
Try multiple combinations of features, train the model for each, and pick the subset that gives
the best results.
● Start with no features, add one feature at a time that improves performance the most.
● Stop when adding more features doesn’t help.
2. Backward Elimination
● Start with all features, remove one at a time that hurts performance the least.
● Keep removing until performance drops.
● Repeatedly trains a model, ranks features by importance, and removes the least important
ones.
● Continue until you reach the desired number of features.
Pros:
● Usually more accurate than filter methods because they consider interactions between
features.
● Tailored to a specific model, optimizing for its performance.
Cons:
Embedded methods integrate feature selection into the model training process. That means the
algorithm learns which features are important while it's fitting the model — rather than before (like
filter methods) or after (like wrapper methods).
How It Works:
Now lets move to subset selection, the second way to do feature selection
2. Subset Selection
● Involves choosing a subset of the top-ranked features to build the final model.
● Goal: Keep only the most relevant features while discarding irrelevant or redundant ones.
Approaches:
● Forward Selection:
Start with no features, add one at a time that improves the model the most.
● Backward Elimination:
Start with all features, remove the least useful one by one.
● Exhaustive Search:
Try all possible combinations of features (not practical for large sets).
● Greedy Methods / Heuristics:
Use performance metrics (e.g., accuracy, AUC) to select a subset without checking all
combinations.
MODEL SELECTION:
Model selection is a core part of the machine learning pipeline — it's about choosing the best model (or
algorithm) for your specific problem and dataset.
Model selection is the process of:
● Classification:
o Logistic Regression
o Decision Tree / Random Forest
o SVM
o k-NN
o Neural Networks
● Regression:
o Linear Regression
o Ridge/Lasso
o SVR
o Gradient Boosting
5. Compare Models
● Look at:
o Performance metrics
o Bias-variance tradeoff
o Training time
o Interpretability
o Scalability
6. Tune Hyperparameters
Module 4:
1. Semi-Supervised Learning
Definition:
Semi-supervised learning is a type of machine learning where the model is trained on a small amount of
labeled data and a large amount of unlabeled data.
Common Techniques:
● Self-training: Train a model on labeled data, use it to predict labels for unlabeled data, retrain
with the combined set.
● Co-training: Use two models to label unlabeled data for each other.
● Graph-based methods: Represent data as a graph and propagate labels through connections.
● Generative models: Learn joint distribution of inputs and labels.
RL is a learning paradigm where an agent learns to make decisions by interacting with an environment
to maximize cumulative reward.
Defined by:
● S: Set of states
● A: Set of actions
● P(s′|s,a): Transition probability function
● R(s,a): Reward function
● γ (gamma): Discount factor (0 ≤ γ ≤ 1)
Bellman Equations
They break down a long-term return into immediate reward + discounted future value. Please see the
respective equations from class notes.
Monte Carlo Methods: Use averaged returns from episodes to estimate value functions.
Monte Carlo (MC) methods are used in Reinforcement Learning to estimate the value function
or action-value function of a given policy, by averaging actual returns observed from sample
episodes.
Key Idea:
To evaluate a policy, run the policy many times in the environment, and use the observed
rewards to estimate how good each state (or state-action pair) is.
No need for knowledge of transition probabilities or reward functions — just run episodes
and average results!
Policy Iteration
Q-learning is a popular model-free reinforcement learning algorithm used to find the optimal
action-selection policy for an agent interacting with an environment.
It tells the agent what to do (which action to take in which state) to maximize total future
reward, even when the agent doesn't know anything about the environment’s dynamics (i.e., the
transition and reward functions).
Core Idea:
Q-learning learns a function called the Q-function, which estimates the expected future reward
for taking a given action a in a given state s, and then following the best policy afterward.
State–Action–Reward–State–Action
Key Idea:
SARSA learns the Q-value of a state-action pair based on the action actually taken by the
agent (rather than the best possible one, like in Q-learning).
MODEL 5:
1. Recommender Systems
Recommender systems suggest items (products, movies, articles, etc.) based on user preferences. There
are three main approaches:
Types:
1. User-based CF
o Finds users with similar tastes (neighbors).
o Recommends items that neighbors liked.
o Similarity metrics: Cosine similarity, Pearson correlation.
2. Item-based CF
o Recommends items similar to what a user liked before.
o More scalable in large datasets than user-based CF.
Pros:
Cons:
B. Content-Based Filtering
Idea: Recommends items similar to those a user has liked, based on item features.
Process:
1. Extract features from items (e.g., genre, keywords, product type).
2. Create user profiles from items the user has rated or interacted with.
3. Compute similarity between user profile and item profiles.
Techniques:
Pros:
Cons:
An ANN is a computational model inspired by the human brain, used to approximate complex functions.
Structure:
Limitation:
MLPs have:
Uses:
Steps:
4. Deep Learning
A subfield of machine learning using multi-layer neural networks to learn high-level representations from
data.
Characteristics:
● Transformers:
o State-of-the-art in NLP (e.g., BERT, GPT).
o Uses self-attention mechanism.
Applications:
● Image classification
● Natural language processing
● Speech recognition
● Recommender systems
● Autonomous vehicles
Take help from youtube videos wherever required. Plenty of material available. The above notes are in
brief, study well.