0% found this document useful (0 votes)

7 views15 pages

ML Notes

The document covers key concepts in machine learning, including feature extraction, model selection, semi-supervised learning, reinforcement learning, and recommender systems. It explains techniques like PCA, SVD, and various feature selection methods, as well as the steps involved in model selection and the principles of neural networks and deep learning. Additionally, it discusses the importance of different learning paradigms and their applications in real-world scenarios.

Uploaded by

vinaygupta.cse26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views15 pages

ML Notes

Uploaded by

vinaygupta.cse26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Module 3:

Feature extraction:

Feature extraction in machine learning is the process of transforming raw data into a set of
meaningful features that can be used by a machine learning model to make predictions or
decisions.

Why is Feature Extraction Important?

Raw data (like images, text, audio, or sensor data) is often complex and unstructured. Machine
learning models work better when the input data is represented in a numerical and structured
form. Feature extraction helps by:

● Reducing dimensionality (making the data smaller and more manageable)

● Highlighting the most relevant information
● Improving model performance and accuracy

PCA (Principal Component Analysis)

● PCA is a technique used to reduce the number of features (dimensions) in your dataset
while keeping as much important information (variance) as possible.

What PCA does:

● Finds new axes (called principal components) that are linear combinations of the original
features.
● These axes are chosen so that:
o The first one captures the most variance in the data.
o The second one captures the most variance left (orthogonal to the first).

Why use PCA?

● To simplify your model (fewer features)

● To visualize high-dimensional data in 2D or 3D
● To remove noise and redundancy
SVD (Singular Value Decomposition)

SVD is a mathematical technique from linear algebra. It factorizes any matrix into three smaller matrices:

Where:

● A is your original data matrix (e.g., rows = samples, columns = features)

● U and V are orthogonal matrices
● Σ (Sigma) is a diagonal matrix of singular values (like the importance of each component)

What SVD does:

● Breaks down the data into its core components

● Lets you reconstruct or compress the data by keeping only the largest singular values

SVD sample numerical link: Single Valued Decomposition ( Numerical )

Relationship Between PCA and SVD

PCA and SVD are closely related:

● PCA is often computed using SVD under the hood.

● When you apply PCA to your data matrix X, you’re essentially doing:

X=UΣVT

And then selecting the top k columns of V (principal components).

Feature selection
It is a key part of machine learning and data preprocessing. It helps improve model performance, reduce
overfitting, and decrease training time. There are two main components to it: Feature Ranking and
Subset Selection.

1. Feature Ranking

● Ranks features based on their importance or relevance to the output/target.

● Each feature gets a score, and features are ranked from most to least important.

Techniques:

● Filter Methods:
o Based on statistics like:
▪ Correlation Coefficient
▪ Chi-Squared Test
▪ ANOVA F-test
● Wrapper Methods:
o Use a predictive model to score feature subsets.
o Example: Recursive Feature Elimination (RFE)
● Embedded Methods:
o Feature importance comes from the model itself.
o Examples:
▪ Lasso Regression (L1 regularization)
▪ Tree-based models (Random Forest, XGBoost)

Filter methods in feature selection are techniques that select the most relevant
features from a dataset before training any machine learning model. They rely
purely on statistical properties of the data, not on any learning algorithm. The
key idea is that filter methods evaluate each feature individually based on its relationship
with the target variable (label), and rank or select the best ones.

Advantages:

● Fast and computationally cheap

● Model-agnostic (works before any model is trained)
● Helps reduce overfitting and improves model performance

Wrapper methods in feature selection are techniques that select features by actually training
a model on different subsets of features and evaluating performance. Instead of relying on just
statistical tests (like filter methods), wrappers use the model's performance as a guide to choose
the best features.
Key Idea:

Try multiple combinations of features, train the model for each, and pick the subset that gives
the best results.

Common Wrapper Methods:

1. Forward Selection

● Start with no features, add one feature at a time that improves performance the most.
● Stop when adding more features doesn’t help.

2. Backward Elimination

● Start with all features, remove one at a time that hurts performance the least.
● Keep removing until performance drops.

3. Recursive Feature Elimination (RFE)

● Repeatedly trains a model, ranks features by importance, and removes the least important
ones.
● Continue until you reach the desired number of features.

Pros:

● Usually more accurate than filter methods because they consider interactions between
features.
● Tailored to a specific model, optimizing for its performance.

Cons:

● Computationally expensive, especially with many features.

● Can easily overfit if not done carefully

Embedded Methods in Feature Selection

Embedded methods integrate feature selection into the model training process. That means the
algorithm learns which features are important while it's fitting the model — rather than before (like
filter methods) or after (like wrapper methods).
How It Works:

● The model has a built-in way to weigh or penalize features.

● Features that are less useful get smaller weights, or are even eliminated during training.

Now lets move to subset selection, the second way to do feature selection

2. Subset Selection

● Involves choosing a subset of the top-ranked features to build the final model.
● Goal: Keep only the most relevant features while discarding irrelevant or redundant ones.

Approaches:

● Forward Selection:
Start with no features, add one at a time that improves the model the most.
● Backward Elimination:
Start with all features, remove the least useful one by one.
● Exhaustive Search:
Try all possible combinations of features (not practical for large sets).
● Greedy Methods / Heuristics:
Use performance metrics (e.g., accuracy, AUC) to select a subset without checking all
combinations.

MODEL SELECTION:
Model selection is a core part of the machine learning pipeline — it's about choosing the best model (or
algorithm) for your specific problem and dataset.
Model selection is the process of:

● Trying out different ML models

● Comparing their performance
● Choosing the one that best balances accuracy, generalization, and computational efficiency

Steps in Model Selection

1. Define the Problem

● Classification? Regression? Clustering?

● That decides the type of models to consider.

2. Choose Candidate Models

● Classification:
o Logistic Regression
o Decision Tree / Random Forest
o SVM
o k-NN
o Neural Networks
● Regression:
o Linear Regression
o Ridge/Lasso
o SVR
o Gradient Boosting

3. Split the Data

● Train/Test split or Cross-validation

● This helps evaluate model performance on unseen data

4. Train and Evaluate Models

● Use evaluation metrics:

o Classification: Accuracy, Precision, Recall, F1-score, ROC-AUC
o Regression: MSE, RMSE, MAE, R²

5. Compare Models

● Look at:
o Performance metrics
o Bias-variance tradeoff
o Training time
o Interpretability
o Scalability

6. Tune Hyperparameters

● Use Grid Search or Randomized Search

● Often combined with cross-validation

7. Select the Best Model

● Final model = best performance on validation/test data

● Test again on hold-out data (if available)

Module 4:
1. Semi-Supervised Learning
Definition:
Semi-supervised learning is a type of machine learning where the model is trained on a small amount of
labeled data and a large amount of unlabeled data.

Why it's useful:

● Labeling data is expensive and time-consuming.

● Unlabeled data is often abundant.

Common Techniques:

● Self-training: Train a model on labeled data, use it to predict labels for unlabeled data, retrain
with the combined set.
● Co-training: Use two models to label unlabeled data for each other.
● Graph-based methods: Represent data as a graph and propagate labels through connections.
● Generative models: Learn joint distribution of inputs and labels.

2. Reinforcement Learning (RL)

RL is a learning paradigm where an agent learns to make decisions by interacting with an environment
to maximize cumulative reward.

Markov Decision Process (MDP)

An MDP provides a mathematical framework for modeling decision-making.

Defined by:

● S: Set of states
● A: Set of actions
● P(s′|s,a): Transition probability function
● R(s,a): Reward function
● γ (gamma): Discount factor (0 ≤ γ ≤ 1)

Goal: Find a policy π(s) that maximizes the expected return.

Bellman Equations

Bellman equations define the value of a state under a certain policy.

● State-Value Function (Vπ):

● Action-Value Function (Qπ):

They break down a long-term return into immediate reward + discounted future value. Please see the
respective equations from class notes.

Policy Evaluation using Monte Carlo

Monte Carlo Methods: Use averaged returns from episodes to estimate value functions.

● No need for transition probabilities.

● Works by sampling complete episodes.
● Only works for episodic tasks.

Monte Carlo (MC) methods are used in Reinforcement Learning to estimate the value function
or action-value function of a given policy, by averaging actual returns observed from sample
episodes.

Key Idea:

To evaluate a policy, run the policy many times in the environment, and use the observed
rewards to estimate how good each state (or state-action pair) is.

No need for knowledge of transition probabilities or reward functions — just run episodes
and average results!

Monte Carlo Policy Evaluation (State-Value Version)

1. Generate episodes: Run multiple episodes using policy.

2. Record returns: For each visit to a state sss, compute the total reward from that point
onward.
3. Average returns: Estimate the value of each state as the average of all returns
following that state.

Policy Iteration

1. Policy Evaluation: Estimate Vπ for the current policy π.

2. Policy Improvement: Update the policy by acting greedily with respect to Vπ.

Repeat until the policy stops changing.

Q-learning is a popular model-free reinforcement learning algorithm used to find the optimal
action-selection policy for an agent interacting with an environment.
It tells the agent what to do (which action to take in which state) to maximize total future
reward, even when the agent doesn't know anything about the environment’s dynamics (i.e., the
transition and reward functions).

Core Idea:

Q-learning learns a function called the Q-function, which estimates the expected future reward
for taking a given action a in a given state s, and then following the best policy afterward.

SARSA stands for:

State–Action–Reward–State–Action

It is a model-free, on-policy reinforcement learning algorithm used to learn an action-value

function (Q-values) — similar to Q-learning — but with a key difference in how it updates its
values.

Key Idea:

SARSA learns the Q-value of a state-action pair based on the action actually taken by the
agent (rather than the best possible one, like in Q-learning).

MODEL 5:
1. Recommender Systems

Recommender systems suggest items (products, movies, articles, etc.) based on user preferences. There
are three main approaches:

A. Collaborative Filtering (CF)

Idea: Uses the preferences of similar users or items to make recommendations.

Types:

1. User-based CF
o Finds users with similar tastes (neighbors).
o Recommends items that neighbors liked.
o Similarity metrics: Cosine similarity, Pearson correlation.
2. Item-based CF
o Recommends items similar to what a user liked before.
o More scalable in large datasets than user-based CF.

Pros:

● No need for item metadata or content.

● Learns from actual user behavior.

Cons:

● Cold start: New users/items have no data.

● Sparsity: User-item matrix often has many missing values.
● Scalability: High computation cost with many users/items.

B. Content-Based Filtering

Idea: Recommends items similar to those a user has liked, based on item features.

Process:

1. Extract features from items (e.g., genre, keywords, product type).
2. Create user profiles from items the user has rated or interacted with.
3. Compute similarity between user profile and item profiles.

Techniques:

● TF-IDF (text-based features).

● Cosine similarity.
● Machine learning models (Naive Bayes, SVMs).

Pros:

● Can recommend new/unpopular items.

● Personalized to each user.

Cons:

● Requires good feature extraction.

● Limited diversity; may recommend similar types only.

2. Artificial Neural Networks (ANNs)

An ANN is a computational model inspired by the human brain, used to approximate complex functions.

A. Perceptron (Single-Layer Neural Network)

Structure:

● Input layer → Output node

● Weights assigned to inputs.
● Activation function determines output.

Limitation:

● Can only solve linearly separable problems.

B. Multilayer Perceptron (MLP)

MLPs have:

● One input layer

● One or more hidden layers
● One output layer

Each neuron is connected to all neurons in the next layer.

Uses:

● Can model complex, non-linear relationships.

● Fundamental to deep learning models.
3. Backpropagation

A supervised learning algorithm for training MLPs.

Steps:

1. Forward Pass:

o Compute the output by passing inputs through the network.
2. Loss Calculation:
o Compute the error between predicted and actual output.
o Common loss functions: MSE, Cross-Entropy.
3. Backward Pass:
o Compute gradients using the chain rule.
o Propagate errors backward through the network.Optimization Algorithms:

● Stochastic Gradient Descent (SGD)

● Adam, RMSProp, etc.

4. Deep Learning

A subfield of machine learning using multi-layer neural networks to learn high-level representations from
data.

Characteristics:

● Composed of many layers (deep).

● Learns from raw data (minimal feature engineering).
● Performs well on large datasets and complex tasks.

Common Deep Learning Architectures:

● Convolutional Neural Networks (CNNs):

o For image and spatial data.
o Layers: Conv, Pooling, Fully connected.
● Recurrent Neural Networks (RNNs):
o For sequential data (text, time series).
o Maintain memory across steps.
o LSTM and GRU improve on basic RNNs.

● Transformers:
o State-of-the-art in NLP (e.g., BERT, GPT).
o Uses self-attention mechanism.
Applications:

● Image classification
● Natural language processing
● Speech recognition
● Recommender systems
● Autonomous vehicles

Take help from youtube videos wherever required. Plenty of material available. The above notes are in
brief, study well.

Machine Learning
No ratings yet
Machine Learning
48 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Lecture 15 - 23.09.2024 - Feature Selection
No ratings yet
Lecture 15 - 23.09.2024 - Feature Selection
47 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Chapter 2,3,4
No ratings yet
Chapter 2,3,4
8 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
40 pages
Gradient Descent: Disclaimer: This PPT Is Modified Based On Hung-Yi Lee
No ratings yet
Gradient Descent: Disclaimer: This PPT Is Modified Based On Hung-Yi Lee
38 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
56 pages
Lesson 5 Deep Neural Net Optimization Tuning Interpretability
100% (1)
Lesson 5 Deep Neural Net Optimization Tuning Interpretability
105 pages
Module 3: Advanced ML Algorithms and Hardware Design Optimization
No ratings yet
Module 3: Advanced ML Algorithms and Hardware Design Optimization
38 pages
Part 3
No ratings yet
Part 3
15 pages
Types of Kernels in Support Vector Machines
No ratings yet
Types of Kernels in Support Vector Machines
14 pages
Unit 5 (Dimensionality Reduction)
No ratings yet
Unit 5 (Dimensionality Reduction)
96 pages
Machine Unit4
No ratings yet
Machine Unit4
55 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
Embedded Methods: Isabelle Guyon André Elisseeff
No ratings yet
Embedded Methods: Isabelle Guyon André Elisseeff
12 pages
Lec-All Deep Learning Coursework
100% (2)
Lec-All Deep Learning Coursework
639 pages
Feature Engineering
No ratings yet
Feature Engineering
2 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Sta 5
No ratings yet
Sta 5
16 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
Dimensionality Reduction Final
No ratings yet
Dimensionality Reduction Final
5 pages
Unit 3
No ratings yet
Unit 3
50 pages
Chapter 5 2025
No ratings yet
Chapter 5 2025
19 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
7 Selectia Trasaturilor
No ratings yet
7 Selectia Trasaturilor
54 pages
ML Unit 1
No ratings yet
ML Unit 1
74 pages
Notes
No ratings yet
Notes
35 pages
Module-3 - DS (Autosaved)
No ratings yet
Module-3 - DS (Autosaved)
18 pages
3ML.03.Feature Reduction
No ratings yet
3ML.03.Feature Reduction
44 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Unit 4 ML
No ratings yet
Unit 4 ML
25 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
PRCV Viva Notes
No ratings yet
PRCV Viva Notes
32 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
UNIT I Part 1 Notes
No ratings yet
UNIT I Part 1 Notes
28 pages
Feature Selection
No ratings yet
Feature Selection
2 pages
End SEM V IMP DSE 2
No ratings yet
End SEM V IMP DSE 2
9 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
Feature Selection - Study Material
No ratings yet
Feature Selection - Study Material
6 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Day School 03
No ratings yet
Day School 03
32 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
ML Notes All
No ratings yet
ML Notes All
32 pages
Featuere Selection
No ratings yet
Featuere Selection
5 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Data Analyst Interview Questionaries
No ratings yet
Data Analyst Interview Questionaries
16 pages
ML Lecture 6 7 Preprocess
No ratings yet
ML Lecture 6 7 Preprocess
43 pages
MACHINE LEARNING 1-5 (Ai &DS)
100% (1)
MACHINE LEARNING 1-5 (Ai &DS)
60 pages
SML
No ratings yet
SML
8 pages
Unit 3
No ratings yet
Unit 3
21 pages
Aiml Model
No ratings yet
Aiml Model
13 pages
Module 3 Notes
No ratings yet
Module 3 Notes
29 pages
LSTM Based Framework With Metaheuristic Optimizer Fo - 2023 - Alexandria Enginee
No ratings yet
LSTM Based Framework With Metaheuristic Optimizer Fo - 2023 - Alexandria Enginee
10 pages
Communication-Efficient Federated Learning For Wireless Edge Intelligence in Iot
No ratings yet
Communication-Efficient Federated Learning For Wireless Edge Intelligence in Iot
9 pages
ML Assignment
No ratings yet
ML Assignment
13 pages
Class Imbalance
No ratings yet
Class Imbalance
12 pages
ML - Interview Prep
No ratings yet
ML - Interview Prep
9 pages
Reinforcement Learning Optimization
No ratings yet
Reinforcement Learning Optimization
6 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
PPB ML Notes
No ratings yet
PPB ML Notes
54 pages
Face Recognition Using Facenet
No ratings yet
Face Recognition Using Facenet
46 pages
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
4 pages
Final ML
No ratings yet
Final ML
2 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Unit 2 Supervised Learning Regression
No ratings yet
Unit 2 Supervised Learning Regression
111 pages
Project Report 2
No ratings yet
Project Report 2
11 pages
Machine Learning: Mathematics
No ratings yet
Machine Learning: Mathematics
11 pages
Sarima VS LSTM For Time Series
No ratings yet
Sarima VS LSTM For Time Series
14 pages
Artificial Neural Network Notes
No ratings yet
Artificial Neural Network Notes
9 pages
Deep Reinforcement Learning Algorithm With Experience Replay and Target Network
No ratings yet
Deep Reinforcement Learning Algorithm With Experience Replay and Target Network
10 pages
Be Computer Engineering Aids Final Year Be Semester 7 8 Rev 2019 C Scheme
No ratings yet
Be Computer Engineering Aids Final Year Be Semester 7 8 Rev 2019 C Scheme
145 pages
21CS743 Model Question Paper Solution
No ratings yet
21CS743 Model Question Paper Solution
32 pages
UNIT-III DeepLearning Notes
No ratings yet
UNIT-III DeepLearning Notes
30 pages
Automated Detection of Underwater Cracks Based On Fusion - 2024 - Engineering S
No ratings yet
Automated Detection of Underwater Cracks Based On Fusion - 2024 - Engineering S
14 pages
Module 3.docxaiml
No ratings yet
Module 3.docxaiml
20 pages
Entropy 22 00643
No ratings yet
Entropy 22 00643
18 pages
Machine Learning Algorithms, Real-World Applications and Research Directions
No ratings yet
Machine Learning Algorithms, Real-World Applications and Research Directions
73 pages
Sheet 3 Sol 3
No ratings yet
Sheet 3 Sol 3
3 pages
Logit Disagreement: OoD Detection With Bayesian Neural Networks
No ratings yet
Logit Disagreement: OoD Detection With Bayesian Neural Networks
14 pages
Unit 4 NNDL
No ratings yet
Unit 4 NNDL
37 pages
Object Detection in Images and Videos Using OpenCV A Comparative Study of Deep Learning and Traditional Computer Vision Techniques
No ratings yet
Object Detection in Images and Videos Using OpenCV A Comparative Study of Deep Learning and Traditional Computer Vision Techniques
6 pages
105 Machine Learning Paper
No ratings yet
105 Machine Learning Paper
6 pages

ML Notes

Uploaded by

ML Notes

Uploaded by

Module 3:

Why is Feature Extraction Important?

●​ Reducing dimensionality (making the data smaller and more manageable)

PCA (Principal Component Analysis)

What PCA does:

Why use PCA?

●​ To simplify your model (fewer features)

●​ A is your original data matrix (e.g., rows = samples, columns = features)

What SVD does:

●​ Breaks down the data into its core components

SVD sample numerical link: Single Valued Decomposition ( Numerical )

Relationship Between PCA and SVD

PCA and SVD are closely related:

●​ PCA is often computed using SVD under the hood.

And then selecting the top k columns of V (principal components).

●​ Ranks features based on their importance or relevance to the output/target.

●​ Fast and computationally cheap

Common Wrapper Methods:

3. Recursive Feature Elimination (RFE)

●​ Computationally expensive, especially with many features.

Embedded Methods in Feature Selection

●​ The model has a built-in way to weigh or penalize features.

●​ Trying out different ML models

Steps in Model Selection

●​ Classification? Regression? Clustering?

2. Choose Candidate Models

3. Split the Data

●​ Train/Test split or Cross-validation

4. Train and Evaluate Models

●​ Use evaluation metrics:

●​ Use Grid Search or Randomized Search

7. Select the Best Model

●​ Final model = best performance on validation/test data

Why it's useful:

●​ Labeling data is expensive and time-consuming.

2. Reinforcement Learning (RL)

Markov Decision Process (MDP)

An MDP provides a mathematical framework for modeling decision-making.

Goal: Find a policy π(s) that maximizes the expected return.

Bellman equations define the value of a state under a certain policy.

●​ Action-Value Function (Qπ):

Policy Evaluation using Monte Carlo

●​ No need for transition probabilities.

Monte Carlo Policy Evaluation (State-Value Version)

1.​ Generate episodes: Run multiple episodes using policy.

1.​ Policy Evaluation: Estimate Vπ for the current policy π.

Repeat until the policy stops changing.

SARSA stands for:

It is a model-free, on-policy reinforcement learning algorithm used to learn an action-value

A. Collaborative Filtering (CF)

Idea: Uses the preferences of similar users or items to make recommendations.

●​ No need for item metadata or content.

●​ Cold start: New users/items have no data.

●​ TF-IDF (text-based features).

●​ Can recommend new/unpopular items.

●​ Requires good feature extraction.

2. Artificial Neural Networks (ANNs)

A. Perceptron (Single-Layer Neural Network)

●​ Input layer → Output node

●​ Can only solve linearly separable problems.

B. Multilayer Perceptron (MLP)

●​ One input layer

Each neuron is connected to all neurons in the next layer.

●​ Can model complex, non-linear relationships.

A supervised learning algorithm for training MLPs.

1.​ Forward Pass:

●​ Stochastic Gradient Descent (SGD)

●​ Composed of many layers (deep).

Common Deep Learning Architectures:

●​ Convolutional Neural Networks (CNNs):

You might also like

● Reducing dimensionality (making the data smaller and more manageable)

● To simplify your model (fewer features)

● A is your original data matrix (e.g., rows = samples, columns = features)

● Breaks down the data into its core components

● PCA is often computed using SVD under the hood.

● Ranks features based on their importance or relevance to the output/target.

● Fast and computationally cheap

● Computationally expensive, especially with many features.

● The model has a built-in way to weigh or penalize features.

● Trying out different ML models

● Classification? Regression? Clustering?

● Train/Test split or Cross-validation

● Use evaluation metrics:

● Use Grid Search or Randomized Search

● Final model = best performance on validation/test data

● Labeling data is expensive and time-consuming.

● Action-Value Function (Qπ):

● No need for transition probabilities.

1. Generate episodes: Run multiple episodes using policy.

1. Policy Evaluation: Estimate Vπ for the current policy π.

● No need for item metadata or content.

● Cold start: New users/items have no data.

● TF-IDF (text-based features).

● Can recommend new/unpopular items.

● Requires good feature extraction.

● Input layer → Output node

● Can only solve linearly separable problems.

● One input layer

● Can model complex, non-linear relationships.

1. Forward Pass:

● Stochastic Gradient Descent (SGD)

● Composed of many layers (deep).

● Convolutional Neural Networks (CNNs):