0% found this document useful (0 votes)
50 views8 pages

FRM Part 1 Quants 2023 ML

1. Linear regression and logistic regression are commonly used supervised learning algorithms for predicting continuous and categorical variables respectively. 2. Decision trees can be used for both classification and regression problems. They work by recursively splitting the data into purer child nodes based on entropy or Gini impurity measures. 3. Principal component analysis (PCA) is an unsupervised technique that reduces the dimensionality of highly correlated data by transforming it into a smaller number of uncorrelated principal components.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views8 pages

FRM Part 1 Quants 2023 ML

1. Linear regression and logistic regression are commonly used supervised learning algorithms for predicting continuous and categorical variables respectively. 2. Decision trees can be used for both classification and regression problems. They work by recursively splitting the data into purer child nodes based on entropy or Gini impurity measures. 3. Principal component analysis (PCA) is an unsupervised technique that reduces the dimensionality of highly correlated data by transforming it into a smaller number of uncorrelated principal components.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

FRM part 1

A. SUPERVISED LEARNING QUANTS 2023

LINEAR REGRESSION PREDICTION ALGORITHM


Exploring the linear relationship among Independent Variable (X) and Dependent
Variable (Y), where Y is a continuous variable i.e., lies b/w- ∞ to ∞
Such that [ y = b 0+ b 1+ x 1+ b 2+ x 2 ] Ex: Sales = 11.75 + 3.75 (Television) + 2.23 (Magazine)
THINGS NEED TO CHECK IN MODEL

Model Accuracy Model Adequacy Significance Generalization


* Measure MSC * Hypothesis Testing * Divide Data into

Training Set Validation Set Test Set


(Training model on this data) (Validation & Tune Model) (Testing Model on our of Sample Data)

TESTING DATA & ERRORS

1 MSE Training Set 2 MSE Validation > MSE Training BIAS-VARIANCE


TRADE-OFF
BIAS ERROR VARIANCE ERROR
(Problem of underfitting) (Overfitting problem)

Solution- Add more features Solution- Penalized regression

LASSO RIDGE ELASTIC NETS


2
Penalty Term:- [λ x β ί ] Penalty Term:- [λ x (β ί ) ] [RIDGE + LASSO]
Penalized Regression includes a constraint such that the regression
coefficient are chosen to minimize {SSE + Penalty Term}. A feature must
make sufficient contribution to model fit to offset penalty from including its.

Shiksha Jain How to choose λ ? K- FOLD CROSS VALIDATION Karan Aggarwal


FRM part 1
QUANTS 2023
LOGISTIC REGRESSION

CLASSIFICATION ALGORITHM
Target Variable (y) is a category/class Ex:- y = Defaulter / Non-defaulter i.e., 1 or 0
x = Gender, X = Income X = Age

Why not Linear Regression?

Linear Regression can give Y any value b/w [-∞ to ∞ ] but we want Y as 1
^
or 0 i.e., Logistic Regression we map Y b/w 0 and 1 like probabilities and
allocate probability as 1 and probability as 0 by using linking functions
& Maximising their probability (MLE).

LINKING FUNCTIONS

Linking Functions Linking Functions

Shiksha Jain Karan Aggarwal


FRM part 1
DECISION TREE aka CLASSIFICATION &
QUANTS 2023
REGRESSION TREE
Classification & Regression Tree (CART) is a supervised ML Technique, that can be
applied to predict either a Categorical Target Variable (y) producing a
CLASSIFICATION TREE or a Continuous Target Variable producing regression tree.
To start decision tree Identify Root Node which has the lowest
Mis-classification based on Lowest
ENTROPY or GINNI

ENTROPY or IMPURITY = [ -P log(P) – (1-P) log (1-P)] GINNI MEASURE =∑ P X (1-P)

#DECISION TREE & PARTIONING OF FEATURES REGRESSION TREE


TRAVEL COST Root Node Same as Classification Tree with difference being
on SSE reduction instead of ENTROPY measure.
ADVANTAGE OF CART
STANDARD EXPENSIVE CHEAP
We can take maximum possible leaf nodes &
achieve 100% accuracy.
Car Train
MALE FEMALE DISADVANTAGE OF CART
Terminal or
Leaf Node
Car Owned Problem of Over-fitting
Bus
Internal Node
1 0 STOPPING CRITERIA PRUNING

RANDOM FOREST Bus Train ❶ Maximum Depth:


Just like penalized
Similar to bagging with an extension of idea taking Limiting growth of tree by
regression, we
Random set of observations + Random set of features too. specifying the no. of splits use λ parameter
ENSEMBLE LEARNING & RANDOM FOREST
❷ Minimum Observation left to prone the tree
specified to grow tree further. & λ is chosen
Ensemble Learning- Instead of basing predictions on the using k-fold cross
result of a single tree use group of trees called ensemble ❸ Maximum Decision validation.
& average result of all the trees will be converged Nodes specified
towards more accurate prediction.
This technique is called Bagging or Bootstrap aggregation.
Shiksha Jain Karan Aggarwal
FRM part 1
K- NEAREST NEIGHBOUR QUANTS 2023

KNN is based on an intuition that new observation will be


classified to be the class which has majority in the nearest
neighbours.
“Like Neighbours Like You”
Find distance of each point to the new observation and
decide the class based on K, where K = No. of Nearest
Neighbours.

New Observation here will


be classified as class 1
based on K = 7

ADVANTAGE OF KNN CHALLENGES OF KNN


Straight Forward & Choosing K
Powerful Large K can dilute the concept
NON-PARAMETRIC of nearest neighbour while
Easy for Multi-class small K can lead to error
classification rate
Shiksha Jain Karan Aggarwal
FRM part 1
SUPPORT VECTOR MACHINE QUANTS 2023
Support Vector Machine is a linear classifier that
determines the hyper plane that originally separates the
observations into 2 sets of data points.

Best Hyper Plane is the plane that maximizes Margin


width based on Support vector (touch points).
“Agar support vectors ko thik se separate kardiya toh baki
correctly classify honge hi.”

If Data is Linearly Separable Hard Margin Classifications

Real world data may not perfectly linearly separable in that case :-

NON-LINEAR SVM SOFT MARGIN SVM

SVM Applications :- Suited for small to medium complex high dimension data.
Shiksha Jain Karan Aggarwal
FRM part 1
QUANTS 2023
PRINCIPAL COMPONENT ANALYSIS

PCA is used to reduce highly correlated features of data into


some main uncorrelated composit variables
called Principal Components (PC).

Suppose our data has 8 features, we find PC


1 by giving weights to original features in
such a manner that variance (information)
explained is maximized.

PC2 will be found again by maximizing the (remaining) variance


explained with subject to the constraint that PC1 & PC2 are
uncorrelated.
Eigenvectors: Weights of new mutually uncorrelated variables.

Eigenvalues: Variance explained by each Eigenvectors (PC’s)

Trade off b/w


Visualization by complexity & accuracy.
SCREE PLOT Here, keeping 3 PC
(78%) variance is
explained while
complexity reduced
from 8 to 3 features.

Shiksha Jain Karan Aggarwal


CLUSTERING ALGORITHMS FRM part 1
QUANTS 2023
Clustering means sorting observations into groups such that the items within the
same cluster are similar and observations in 2 different clusters are as dissimilar as
possible. “A property known as separation.”

CASE 1:- K (No. of clusters) known K- Means Clustering


STEPS:-
1. Decide K (Hyper Parameter)
2. Randomly assign each value to a particular cluster
3. Find out Centroid (Average) of each cluster
4. Calculate distance (Euclidean distance) between each observation & 2 centroids
5. Based on distance, assign each observations to its closest centroid
Repeat steps 3 to 5 till observations shops shifting to new groups i.e.,
Algorithm converges to its true solution

CASE 2:- K (No. of clusters) known Hierarchical Clustering

Agglomerating Clustering BOTTOM-UP TOP-DOWN Divisive Clustering

Consider each observation as a single Consider all observations belonging to


cluster than find 2 clusters & combine a single cluster. Divide observations
them into one new large cluster. Repeat based on some measure of distance
this process iteratively till all and continue the partition until each
observations are clumped into a single cluster contains only one single
cluster. observation.

Shiksha Jain Karan Aggarwal


NEURAL NETWORKS FRM part 1
QUANTS 2023
Each Node within hidden layer has 2 functional parts:-
1. Summation Operator
2. Activation/Transformation Function
Summation Operator- Multiplies each Input by a
weight and sums the weighted values to form
the total Net Input.
Total Net Input is then passed to activation
function which transforms the input into final
output of the node.
The Output of hidden layer is transmitted to next
set of nodes or the output layer. Output layer
again contains an activation function & a
summation operator.

Advantages Of Neural Network


Capture Complex Non-Linear Interaction
among features.
Disadvantages Of Neural Network
Risk of Overfitting Black Box Non Interpretable

DEEP LEARNING & REINFORCEMENT LEARNING


DEEP LEARNING
If the number of hidden layers are very large, neural networks is called Deep Learning Nets (DLN)
REINFORCEMENT LEARNING
RL algorithm involves an agent that should perform actions that will maximise its rewards
overtime, taking into consideration the constraint of its environment.
Ex: A Virtual Gamer (Agent) uses his console commands (Actions) with the information on the
screen (Environment) to maximise his/her score (Reward).
Shiksha Jain Karan Aggarwal

You might also like