0% found this document useful (0 votes)
7 views15 pages

ML Notes

The document covers key concepts in machine learning, including feature extraction, model selection, semi-supervised learning, reinforcement learning, and recommender systems. It explains techniques like PCA, SVD, and various feature selection methods, as well as the steps involved in model selection and the principles of neural networks and deep learning. Additionally, it discusses the importance of different learning paradigms and their applications in real-world scenarios.

Uploaded by

vinaygupta.cse26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views15 pages

ML Notes

The document covers key concepts in machine learning, including feature extraction, model selection, semi-supervised learning, reinforcement learning, and recommender systems. It explains techniques like PCA, SVD, and various feature selection methods, as well as the steps involved in model selection and the principles of neural networks and deep learning. Additionally, it discusses the importance of different learning paradigms and their applications in real-world scenarios.

Uploaded by

vinaygupta.cse26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Module 3:

Feature extraction:

Feature extraction in machine learning is the process of transforming raw data into a set of
meaningful features that can be used by a machine learning model to make predictions or
decisions.

Why is Feature Extraction Important?

Raw data (like images, text, audio, or sensor data) is often complex and unstructured. Machine
learning models work better when the input data is represented in a numerical and structured
form. Feature extraction helps by:

●​ Reducing dimensionality (making the data smaller and more manageable)


●​ Highlighting the most relevant information
●​ Improving model performance and accuracy

PCA (Principal Component Analysis)

●​ PCA is a technique used to reduce the number of features (dimensions) in your dataset
while keeping as much important information (variance) as possible.

What PCA does:

●​ Finds new axes (called principal components) that are linear combinations of the original
features.
●​ These axes are chosen so that:
o​ The first one captures the most variance in the data.
o​ The second one captures the most variance left (orthogonal to the first).

Why use PCA?

●​ To simplify your model (fewer features)


●​ To visualize high-dimensional data in 2D or 3D
●​ To remove noise and redundancy
SVD (Singular Value Decomposition)

SVD is a mathematical technique from linear algebra. It factorizes any matrix into three smaller matrices:

Where:

●​ A is your original data matrix (e.g., rows = samples, columns = features)


●​ U and V are orthogonal matrices
●​ Σ (Sigma) is a diagonal matrix of singular values (like the importance of each component)

What SVD does:

●​ Breaks down the data into its core components


●​ Lets you reconstruct or compress the data by keeping only the largest singular values

SVD sample numerical link: Single Valued Decomposition ( Numerical )

Relationship Between PCA and SVD

PCA and SVD are closely related:

●​ PCA is often computed using SVD under the hood.


●​ When you apply PCA to your data matrix X, you’re essentially doing:

X=UΣVT

And then selecting the top k columns of V (principal components).

Feature selection
It is a key part of machine learning and data preprocessing. It helps improve model performance, reduce
overfitting, and decrease training time. There are two main components to it: Feature Ranking and
Subset Selection.

1. Feature Ranking

●​ Ranks features based on their importance or relevance to the output/target.


●​ Each feature gets a score, and features are ranked from most to least important.

Techniques:

●​ Filter Methods:
o​ Based on statistics like:
▪​ Correlation Coefficient
▪​ Chi-Squared Test
▪​ ANOVA F-test
●​ Wrapper Methods:
o​ Use a predictive model to score feature subsets.
o​ Example: Recursive Feature Elimination (RFE)
●​ Embedded Methods:
o​ Feature importance comes from the model itself.
o​ Examples:
▪​ Lasso Regression (L1 regularization)
▪​ Tree-based models (Random Forest, XGBoost)

Filter methods in feature selection are techniques that select the most relevant
features from a dataset before training any machine learning model. They rely
purely on statistical properties of the data, not on any learning algorithm. The
key idea is that filter methods evaluate each feature individually based on its relationship
with the target variable (label), and rank or select the best ones.

Advantages:

●​ Fast and computationally cheap


●​ Model-agnostic (works before any model is trained)
●​ Helps reduce overfitting and improves model performance

Wrapper methods in feature selection are techniques that select features by actually training
a model on different subsets of features and evaluating performance. Instead of relying on just
statistical tests (like filter methods), wrappers use the model's performance as a guide to choose
the best features.
Key Idea:

Try multiple combinations of features, train the model for each, and pick the subset that gives
the best results.

Common Wrapper Methods:


1. Forward Selection

●​ Start with no features, add one feature at a time that improves performance the most.
●​ Stop when adding more features doesn’t help.

2. Backward Elimination

●​ Start with all features, remove one at a time that hurts performance the least.
●​ Keep removing until performance drops.

3. Recursive Feature Elimination (RFE)

●​ Repeatedly trains a model, ranks features by importance, and removes the least important
ones.
●​ Continue until you reach the desired number of features.

Pros:

●​ Usually more accurate than filter methods because they consider interactions between
features.
●​ Tailored to a specific model, optimizing for its performance.

Cons:

●​ Computationally expensive, especially with many features.


●​ Can easily overfit if not done carefully

Embedded Methods in Feature Selection

Embedded methods integrate feature selection into the model training process. That means the
algorithm learns which features are important while it's fitting the model — rather than before (like
filter methods) or after (like wrapper methods).
How It Works:

●​ The model has a built-in way to weigh or penalize features.


●​ Features that are less useful get smaller weights, or are even eliminated during training.

Now lets move to subset selection, the second way to do feature selection

2. Subset Selection

●​ Involves choosing a subset of the top-ranked features to build the final model.
●​ Goal: Keep only the most relevant features while discarding irrelevant or redundant ones.

Approaches:

●​ Forward Selection:​
Start with no features, add one at a time that improves the model the most.
●​ Backward Elimination:​
Start with all features, remove the least useful one by one.
●​ Exhaustive Search:​
Try all possible combinations of features (not practical for large sets).
●​ Greedy Methods / Heuristics:​
Use performance metrics (e.g., accuracy, AUC) to select a subset without checking all
combinations.

MODEL SELECTION:
Model selection is a core part of the machine learning pipeline — it's about choosing the best model (or
algorithm) for your specific problem and dataset.
Model selection is the process of:

●​ Trying out different ML models


●​ Comparing their performance
●​ Choosing the one that best balances accuracy, generalization, and computational efficiency

Steps in Model Selection


1. Define the Problem

●​ Classification? Regression? Clustering?


●​ That decides the type of models to consider.

2. Choose Candidate Models

●​ Classification:
o​ Logistic Regression
o​ Decision Tree / Random Forest
o​ SVM
o​ k-NN
o​ Neural Networks
●​ Regression:
o​ Linear Regression
o​ Ridge/Lasso
o​ SVR
o​ Gradient Boosting

3. Split the Data

●​ Train/Test split or Cross-validation


●​ This helps evaluate model performance on unseen data

4. Train and Evaluate Models

●​ Use evaluation metrics:


o​ Classification: Accuracy, Precision, Recall, F1-score, ROC-AUC
o​ Regression: MSE, RMSE, MAE, R²

5. Compare Models

●​ Look at:
o​ Performance metrics
o​ Bias-variance tradeoff
o​ Training time
o​ Interpretability
o​ Scalability

6. Tune Hyperparameters

●​ Use Grid Search or Randomized Search


●​ Often combined with cross-validation

7. Select the Best Model

●​ Final model = best performance on validation/test data


●​ Test again on hold-out data (if available)

Module 4:
1. Semi-Supervised Learning
Definition:​
Semi-supervised learning is a type of machine learning where the model is trained on a small amount of
labeled data and a large amount of unlabeled data.

Why it's useful:

●​ Labeling data is expensive and time-consuming.


●​ Unlabeled data is often abundant.

Common Techniques:

●​ Self-training: Train a model on labeled data, use it to predict labels for unlabeled data, retrain
with the combined set.
●​ Co-training: Use two models to label unlabeled data for each other.
●​ Graph-based methods: Represent data as a graph and propagate labels through connections.
●​ Generative models: Learn joint distribution of inputs and labels.

2. Reinforcement Learning (RL)

RL is a learning paradigm where an agent learns to make decisions by interacting with an environment
to maximize cumulative reward.

Markov Decision Process (MDP)

An MDP provides a mathematical framework for modeling decision-making.

Defined by:

●​ S: Set of states
●​ A: Set of actions
●​ P(s′|s,a): Transition probability function
●​ R(s,a): Reward function
●​ γ (gamma): Discount factor (0 ≤ γ ≤ 1)

Goal: Find a policy π(s) that maximizes the expected return.

Bellman Equations

Bellman equations define the value of a state under a certain policy.


●​ State-Value Function (Vπ):

●​ Action-Value Function (Qπ):

They break down a long-term return into immediate reward + discounted future value. Please see the
respective equations from class notes.

Policy Evaluation using Monte Carlo

Monte Carlo Methods: Use averaged returns from episodes to estimate value functions.

●​ No need for transition probabilities.


●​ Works by sampling complete episodes.
●​ Only works for episodic tasks.

Monte Carlo (MC) methods are used in Reinforcement Learning to estimate the value function
or action-value function of a given policy, by averaging actual returns observed from sample
episodes.

Key Idea:

To evaluate a policy, run the policy many times in the environment, and use the observed
rewards to estimate how good each state (or state-action pair) is.

No need for knowledge of transition probabilities or reward functions — just run episodes
and average results!

Monte Carlo Policy Evaluation (State-Value Version)

1.​ Generate episodes: Run multiple episodes using policy.


2.​ Record returns: For each visit to a state sss, compute the total reward from that point
onward.
3.​ Average returns: Estimate the value of each state as the average of all returns
following that state.

Policy Iteration

1.​ Policy Evaluation: Estimate Vπ for the current policy π.


2.​ Policy Improvement: Update the policy by acting greedily with respect to Vπ.

Repeat until the policy stops changing.

Q-learning is a popular model-free reinforcement learning algorithm used to find the optimal
action-selection policy for an agent interacting with an environment.
It tells the agent what to do (which action to take in which state) to maximize total future
reward, even when the agent doesn't know anything about the environment’s dynamics (i.e., the
transition and reward functions).

Core Idea:

Q-learning learns a function called the Q-function, which estimates the expected future reward
for taking a given action a in a given state s, and then following the best policy afterward.

SARSA stands for:

State–Action–Reward–State–Action

It is a model-free, on-policy reinforcement learning algorithm used to learn an action-value


function (Q-values) — similar to Q-learning — but with a key difference in how it updates its
values.

Key Idea:

SARSA learns the Q-value of a state-action pair based on the action actually taken by the
agent (rather than the best possible one, like in Q-learning).

MODEL 5:
1. Recommender Systems

Recommender systems suggest items (products, movies, articles, etc.) based on user preferences. There
are three main approaches:

A. Collaborative Filtering (CF)

Idea: Uses the preferences of similar users or items to make recommendations.

Types:

1.​ User-based CF
o​ Finds users with similar tastes (neighbors).
o​ Recommends items that neighbors liked.
o​ Similarity metrics: Cosine similarity, Pearson correlation.
2.​ Item-based CF
o​ Recommends items similar to what a user liked before.
o​ More scalable in large datasets than user-based CF.

Pros:

●​ No need for item metadata or content.


●​ Learns from actual user behavior.

Cons:

●​ Cold start: New users/items have no data.


●​ Sparsity: User-item matrix often has many missing values.
●​ Scalability: High computation cost with many users/items.

B. Content-Based Filtering

Idea: Recommends items similar to those a user has liked, based on item features.

Process:

1.​ Extract features from items (e.g., genre, keywords, product type).
2.​ Create user profiles from items the user has rated or interacted with.
3.​ Compute similarity between user profile and item profiles.

Techniques:

●​ TF-IDF (text-based features).


●​ Cosine similarity.
●​ Machine learning models (Naive Bayes, SVMs).

Pros:

●​ Can recommend new/unpopular items.


●​ Personalized to each user.

Cons:

●​ Requires good feature extraction.


●​ Limited diversity; may recommend similar types only.

2. Artificial Neural Networks (ANNs)

An ANN is a computational model inspired by the human brain, used to approximate complex functions.

A. Perceptron (Single-Layer Neural Network)

Structure:

●​ Input layer → Output node


●​ Weights assigned to inputs.
●​ Activation function determines output.

Limitation:

●​ Can only solve linearly separable problems.

B. Multilayer Perceptron (MLP)

MLPs have:

●​ One input layer


●​ One or more hidden layers
●​ One output layer

Each neuron is connected to all neurons in the next layer.

Uses:

●​ Can model complex, non-linear relationships.


●​ Fundamental to deep learning models.
3. Backpropagation

A supervised learning algorithm for training MLPs.

Steps:

1.​ Forward Pass:


o​ Compute the output by passing inputs through the network.
2.​ Loss Calculation:
o​ Compute the error between predicted and actual output.
o​ Common loss functions: MSE, Cross-Entropy.
3.​ Backward Pass:
o​ Compute gradients using the chain rule.
o​ Propagate errors backward through the network.Optimization Algorithms:

●​ Stochastic Gradient Descent (SGD)


●​ Adam, RMSProp, etc.

4. Deep Learning

A subfield of machine learning using multi-layer neural networks to learn high-level representations from
data.

Characteristics:

●​ Composed of many layers (deep).


●​ Learns from raw data (minimal feature engineering).
●​ Performs well on large datasets and complex tasks.

Common Deep Learning Architectures:

●​ Convolutional Neural Networks (CNNs):


o​ For image and spatial data.
o​ Layers: Conv, Pooling, Fully connected.
●​ Recurrent Neural Networks (RNNs):
o​ For sequential data (text, time series).
o​ Maintain memory across steps.
o​ LSTM and GRU improve on basic RNNs.

●​ Transformers:
o​ State-of-the-art in NLP (e.g., BERT, GPT).
o​ Uses self-attention mechanism.
Applications:

●​ Image classification
●​ Natural language processing
●​ Speech recognition
●​ Recommender systems
●​ Autonomous vehicles

Take help from youtube videos wherever required. Plenty of material available. The above notes are in
brief, study well.

You might also like