0% found this document useful (0 votes)
7 views18 pages

Intro To Machine Learning New

The document provides an overview of machine learning models, contrasting them with classical econometrics, and detailing various types of machine learning including supervised, unsupervised, and reinforcement learning. It discusses data preparation techniques, the importance of data cleaning, and methods for dimensionality reduction such as principal components analysis. Additionally, it covers the concepts of overfitting and underfitting, the role of training, validation, and test datasets, and introduces natural language processing as a key application of machine learning.

Uploaded by

p24aakankshat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views18 pages

Intro To Machine Learning New

The document provides an overview of machine learning models, contrasting them with classical econometrics, and detailing various types of machine learning including supervised, unsupervised, and reinforcement learning. It discusses data preparation techniques, the importance of data cleaning, and methods for dimensionality reduction such as principal components analysis. Additionally, it covers the concepts of overfitting and underfitting, the role of training, validation, and test datasets, and introduces natural language processing as a key application of machine learning.

Uploaded by

p24aakankshat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

FRM PART 1

MACHINE LEARNING MODELS

[email protected]

(+91) 8652653607
www.vardeez.com
LO 25.a: Discuss the philosophical and practical differences between machine learning techniques and classical econometrics.
What is Machine Learning
Machine learning [ML] is an umbrella term that covers a range of techniques in which a model is trained to recognize patterns in data to
suit a range of applications, including prediction and classification.
Here, machine learning allows data to decide what the models will include, with no specific hypothesis from an analyst tested as part of
the process.

What Is Classical Econometrics


Econometrics is about using statistics to study how different economic factors are related to each other and to make predictions.
Here, An analyst chooses the model and variables, while a computer algorithm is used to estimate parameters and test their
significance

Why Machine Learning is Better


It can handle large amounts of data and also provides greater flexibility in managing the information
It can also handle non-linear variable interactions.
LO 25.g: Differentiate among unsupervised, supervised, and reinforcement learning models.
Supervised Learning Unsupervised Learning
This is concerned with prediction. Learning from This is concerned with recognizing patterns in data
labeled data, where the model is trained to map inputs with no explicit target. So basically it learns the pattern
to outputs based on a set of examples. in the data by itself. There is no labeled data provided
to the model.
When we use this :
If you want to predict (house price) or classify the loans When we use this :
as will pay or will default. When we want to cluster the data or finding a small
number of factors that explain the data.

Algorithms for SL: Algorithms for UL:


Logistic Regression Model, K-means clustering.
Decision trees. Principle Component Analysis
K-nearest neighbor Anomaly detection

Labeled Data

Size, Location etc


DATA PREPARATION
Many machine-learning approaches require all the variables to be measured on the same scale; otherwise, the technique will not be
able to determine the parameters appropriately.
There are broadly two methods to achieve this rescaling

1) Standardization : This involves subtracting the sample mean of each variable from all observations on that variable and dividing by
its standard deviation.

standardization for bank A


and the customers feature
2) Normalization: In this process, also called min-max transformation, creates a variable between zero and one which will not usually
have a zero mean or unit variance.

the normalization
calculation for bank A and
the customers feature
Data Cleaning
Data cleaning is an important part of machine learning that can take up to 80% of a data analyst’s time.
Large data sets usually have issues that need to be fixed, and good data cleaning can make all the difference between a successful and
unsuccessful machine-learning project

Several reasons for data cleaning include:


1. Inconsistent recording. For data to be read correctly, it is important that all data is recorded in the same way.
2. Unwanted observations. Observations not relevant to the task at hand should be removed.
3. Duplicate observations. These should be removed to avoid biases.
4. Outliers. Observations on a feature that are several standard deviations from the mean should be checked carefully, as they can
have a big effect on results.
5. Missing data. This is the most common problem. If there are only a small number of observations with missing data, they can be
removed. Otherwise, one approach is to replace missing observations on a feature with the mean or median of the observations on
the feature that are not missing. Alternatively, the missing observations can be estimated in some way from observations on other
features
LO 25.d: Use principal components analysis to reduce the dimensionality of a set of features.
A popular statistical technique for dimension reduction in unsupervised learning models is principal components analysis (PCA).
The goal of PCA is to produce almost the same amount of information using a small number of uncorrelated components (i.e.,
variables) that a large number of correlated components can provide.
Thus, in a machine learning model, PCA is used to reduce the number of features.

PCA is often applied to yield curve movements by producing a small count of uncorrelated components that describe the movements of
the curve

The total variance is equal to the following: (12.96)2 + (5.82)2 + (2.14)2 + (1.79)2 = 209.62

Based on a review of seven Treasury rates over a 10-year period (120 months), the first three observed components were responsible
for 99% of the overall variation in yield movements due to the high correlation between yield movements
LO 25.e: Describe how the K-means algorithm separates a sample into clusters.
To identify the structure of a dataset, an unsupervised K-means algorithm can be used to separate dataset observations into clusters.
The value K represents the number of clusters and is set by an analyst.

The centers of the data clusters are called centroids and are initially randomly chosen.
Each data point is allocated to its nearest centroid and then the centroid is recalculated to be at the center of all the data points
assigned to it. This process continues until the centroids remain constant.
The Euclidean Distance :
The Euclidean distance is the square root of the sum of the squares of the distances between the feature for one bank and the
corresponding feature for the other bank summed over all the features.

The Manhattan distance measure:


The Manhattan distance is the sum over all the features of the absolute differences between the corresponding feature pairs:
Concept of Inertia
The goal of the K-means algorithm is to minimize the distance between each observed point and its centroid. A model’s fit is better
when the individual data points are close to their centroid. A lower inertia implies a better cluster fit. However, because inertia will
always fall as more centroids are added, there is a limit to which adding more centroids adds value

• Inertia, a measure of the distance (d) between each data point (j) and its centroid, is defined as

As an alternative approach, a silhouette coefficient can be used to choose K by comparing the distance between an observation and
other points in its own cluster to its distance to data points in the next closest cluster.
The highest silhouette score will produce the optimal value of K.
LO 25.b: Explain the differences among the training, validation, and test data sub-samples, and how each is used.
1) The Training Set : The training set is employed to estimate model parameters, this is the part of the data from which the computer
actually learns how best to represent its characteristics.

2) The Validation Set: It is used to select between competing models. We are comparing different alternative models to determine
which one generalizes best to new data.

3) The Testing Set : This is retained to determine the final chosen model’s effectiveness.
Few Concerns that needs to be understood for these data sets.
1) An obvious question is, how much of the overall data available should be used for each sub-sample ?
Although there is no set allocation for how much a given sample should go to the respective sets above, a typical allocation is two-
thirds of the data going to the training set, one-sixth going to the validation set, and the other one-sixth going to the test set.

2) What if the data is small ?


If the training sample is too small, this can introduce biases into the parameter estimation, whereas if the validation sample is too
small, model evaluation can be inaccurate so that it is hard to identify the best specification

If the data set is relatively small, k-fold cross-validation may be utilized


This technique combines training and validation data into a single sample, with the combined data (n) allocated into k samples

Note : The larger the data set, the lower the risk of improper allocations.
LO 25.c: Understand the differences between and consequences of underfitting and overfitting, and propose potential remedies for
each
Overfiiting :
Overfitting is a situation in which a model is chosen that is “too large” or excessively parameterized
Overfitting gives a false impression of an excellent specification because the error rate on the training set will be very low (possibly
close to zero). However, when applied to other Testing data, the model’s performance will likely be poor and the model will not be able
to generalize well.

lower bias and higher variance

Underfitting:
Underfitting is the opposite problem to overfitting and occurs when relevant patterns in the data remain uncaptured by the model.

higher bias and lower variance errors

Bias - It represents the difference between the Actual and Predicted value of data.
Variance – means estimation errors
LO 25.h: Explain how reinforcement learning operates and how it is used in decision-making.
Reinforcement learning
It involves the creation of a policy for decision-making, with the goal of maximizing reward.
Best Inv
It uses a trial and-error approach. Example: Game Playing, Robotics Strategy

The key areas of reinforcement learning are known as states, actions, and rewards:
a) States (S): define the environment.
b) Actions (A): represent the decisions taken.
c) Rewards (R): maximized when the best possible decision is made.

Reinforcement learning has many applications in risk management.


For example, it is used to determine the optimal way to buy or sell a large block of shares, to determine how a portfolio should be
managed, and to hedge derivatives portfolios

A disadvantage of reinforcement learning algorithms is that they tend to require larger amounts of training data than other machine-
learning approaches.

.
To determine actions taken for each state, the algorithm will choose between the best action already identified (known as exploitation)
and a new action (known as exploration).
The probability assigned to exploitation and exploration is p and 1 – p, respectively.
As more trials are completed and the algorithm has learned the superior strategies, the value of p increases

The Q-value is the expected value of taking an action (A) in a certain state (S). The best action to take in any given state (S) is whatever
the value of A is that maximizes the expression below:

The Monte Carlo method may be deployed to evaluate actions (A) taken in states (S) and the subsequent rewards (R) that may result.
The formula is shown as follows, with the α parameter set at a number like 0.01 or 0.05.

The temporal difference learning method, an alternative to the Monte Carlo method, assumes the best strategy thus far is the one to
be made going forward and will only look one decision ahead.
LO 25.f: Be aware of natural language processing and how it is used.
Natural language processing (NLP) : It is sometimes also known as text mining, is an aspect of machine learning that is concerned with
understanding and analyzing human language, both written and spoken.
Example : Automated Virtual Assistants, Newswire Statement classification like news is Corporate, Educational and so on.
Benefits of NLP : NLP offers the benefit of speed and document review without inconsistencies or bias found in human reviews.

Steps in NLP :
1) Capturing the language in a transcript or a written document; 2) Pre-processing the text; and 3) Analyzing it for a particular purpose.

Preprocessing text requires the following steps:


1) Tokenize : The document must be tokenized, which means identifying only the words (i.e., removing punctuation, symbols, and
spacing) and modifying all of them into lowercase.
2) Removing stopwords : such as “the,” “has,” and “a.” These words are designed to make sentences flow but have no other value.
3) Stemming, which means replacing words with their stems. For example, “arguing,” “argued,” and “argues” maps to “augu.”
4) Lemmatization, which is replacing words with their lemmas. For example, “worse” maps to “bad.” This is a similar concept to
stemming, but the lemma will be an actual word.
5) N-grams may be considered, which are groups of words that have meaning when placed together as opposed to being considered
individually. For example, the trigram “exceed analyst expectations” may be more meaningful than the separate words “exceed,”
“analyst,” and “expectations.”
Tokenized

Stopwords Removal

Stemming

Lemmatization

NLP will use an inventory of sentiment words to assess whether things like corporate news releases are considered positive, negative,
or neutral

You might also like