0% found this document useful (0 votes)
12 views

Structuring Machine learning projects

The document outlines key strategies for structuring machine learning projects, emphasizing the importance of orthogonalization in tuning model parameters and hyperparameters. It discusses evaluation metrics such as precision, recall, and F1 score, as well as the significance of training, dev, and test set distributions. Additionally, it covers concepts like avoidable bias, variance trade-offs, error analysis, and advanced techniques such as transfer learning and multi-task learning.

Uploaded by

saksham2700
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Structuring Machine learning projects

The document outlines key strategies for structuring machine learning projects, emphasizing the importance of orthogonalization in tuning model parameters and hyperparameters. It discusses evaluation metrics such as precision, recall, and F1 score, as well as the significance of training, dev, and test set distributions. Additionally, it covers concepts like avoidable bias, variance trade-offs, error analysis, and advanced techniques such as transfer learning and multi-task learning.

Uploaded by

saksham2700
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Structuring Machine learning projects

30 July 2020 12:25

Orthogonalization:
Knowing what parameter/ hyperparameter to tune in the model, in o
some change in the model, we know what parameter to tune to bring

Example: For Supervised learning, for every step we have the followin
1. Fit the training set well on the cost function -> (Bigger network,
2. Fit dev set well on the cost function --> (Regularization, bigger tr
3. Fit test set well on the cost function --> (Bigger dev set, etc.)
4. Performs well in the real world. --> (Change the dev set, or the c

SINGLE NUMBER EVALUATION METRIC:


Set up the goal of your project. How will you measure the success of t

𝑻𝒓𝒖𝒆 𝒑𝒐𝒔𝒊𝒕𝒊𝒗𝒆𝒔
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 =
𝑻𝒓𝒖𝒆 𝒑𝒐𝒔𝒊𝒕𝒊𝒗𝒆𝒔 + 𝑭𝒂𝒍𝒔𝒆 𝒑𝒐𝒔𝒊𝒕𝒊𝒗𝒆𝒔

Amongst all the examples classified as positive, how many did it class
Example - A classifier has 95% precision. If it classifies something as a c
actually a cat.

𝑻𝒓𝒖𝒆 𝒑𝒐𝒔𝒊𝒕𝒊𝒗𝒆𝒔
𝑹𝒆𝒄𝒂𝒍𝒍 =
𝑻𝒓𝒖𝒆 𝒑𝒐𝒔𝒊𝒕𝒊𝒗𝒆𝒔 + 𝑭𝒂𝒍𝒔𝒆 𝒏𝒆𝒈𝒂𝒕𝒊𝒗𝒆𝒔

Amongst all positives it got in the data, how much did it identify corre
Example - A classifier has 98% recall. If we give the classifier 100 examp
correctly.

But, both of them are required to evaluate a classifier. We want both o


order to achieve what effect. If we want
g that change.

ng knobs to tune the model:


different optimization algo, etc.)
raining set, etc.)

cost function, etc.)

the project ?

sify correctly ?
cat, then there is 95% chance that it is

ectly?
ples, it should identify about 98 of them

of them. The standard way to combine


Example - A classifier has 98% recall. If we give the classifier 100 examp
correctly.

But, both of them are required to evaluate a classifier. We want both o


Precision and recall is using F1 score.

2
𝐹1 𝑠𝑐𝑜𝑟𝑒 = (𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐 𝑀𝑒𝑎𝑛)
1 1
𝑃+𝑅
The advantage is that even if one of them is low, the F1 score will be l
F1 score.

Another example.
If we have 2 classifiers that work differently for different geographies
better overall ? Just take the average of each classifier over all the geo
the 2 classifiers.

Satisficing and Optimizing metric:


Suppose we have the following example:

Here, we can't choose 1 of them (both important), and also, we can't c


So, we can form a metric as the following:
MAXIMIZE the accuracy
Subject to Running time ≤ 100 ms
ples, it should identify about 98 of them

of them. The standard way to combine

low. We want a classifier with the highest

s. How can we decide which one works


ographies and compare this average for

combine both of them (different units).


So, we can form a metric as the following:
MAXIMIZE the accuracy
Subject to Running time ≤ 100 ms

Using this metric, we can get that B is the best choice.


Here, Accuracy is the optimizing metric (Which we want to achieve),
(minimum condition which we want to satisfy)

Dev and test set should come from the same distribution.
• For small Datasets, say ≤ 500,000 examples
○ 70% training, 30% dev
○ 60% training, 20% dev, 20% test
• But if very large datasets, say 1,000,000 examples
○ 98% train, 1%dev, 1% test
Set you test set to be big enough to give high confidence on the overal

Bayes' optimal error:


Maximum possible accuracy, minimum possible error. No system can
than humans.

𝑨𝒗𝒐𝒊𝒅𝒂𝒃𝒍𝒆 𝑩𝒊𝒂𝒔 = 𝑬𝒓𝒓𝒄𝒍𝒇 𝒐𝒏 𝒕𝒓𝒂𝒊𝒏𝒊𝒏𝒈 𝒅𝒂𝒕𝒂 − 𝑬𝒓𝒓𝑩𝒂𝒚𝒆𝒔! 𝒄𝒍𝒇

For example, the Bayes' error / Human Error = 7.5%


And the Training error of our classifier is 8%.
The Avoidable bias is only (8-7.5) = 0.5%, because we can't anyway g

Improving your model performance:


Avoidable bias / Variance
Look at the difference between:
1. Human Level Error
2. Training Error
3. Dev Error
This will give you the Avoidable bias / Variance trade-off ! You can im
and Running time Is the Satisficing metric

ll performance of your system.

n surpass this performance. Even better

get below 7.5%.

mprove accordingly.
Look at the difference between:
1. Human Level Error
2. Training Error
3. Dev Error
This will give you the Avoidable bias / Variance trade-off ! You can im
For Avoiding Avoidable bias: Train bigger model, Train longer, Use be
architecture, perform a hyperparameter search.

For avoiding Variance, Get more data, perform regularization, change


search.

Error Analysis
Sometimes, through manual Error analysis, we get a lot of insights on

v Whenever starting a project on which a lot of literature is not ava


implementation to get some idea about bias/variance and errors
v If you are building upon some project on which a lot of literature
more complex system from the start only.

Training and Testing on different distributions


If we have 200k examples of high quality cat images
And we have 10k examples from low quality images
& we want to build a classifier for low quality images,
What can we do ?
1. 1st option : Take all the available images, randomly shuffle the im
sets.
2. 2nd Option: Take all 200k high quality images, and 5k images low
low quality images into Dev set & 2.5k low quality images into te
2nd Option works better.

How to find whether your error is because of difference between


of High Variance ?
mprove accordingly.
etter optimization algorithms. Change NN

e NN architecture, do hyperparameters

n what to do next.

ailable, just do a quick and dirty


s. Then start building upon that system.
e is already available, then you can build a

mages & divide it into Train, Dev & test

w quality into the Training set. Take 2.5k


est set.

n Training data & dev Data or because


How to find whether your error is because of difference between
of High Variance ?
Tr err = 9%
Dev err = 15%
Is it a high Variance? Or is it because of high difference between Tr da

To find this out, pull out some data from the training set and some fro
(before training). We will call this data the train-dev set. It's the mixtu
Train the ML model on the training data alone.
Find out the train-dev set error, dev-set error
Consider the following cases:
Human Level err ~0% ~0% ~0% ~0%
Tr err 1% 1% 10% 10%
Tr-dev err 1.5% 9% 11% 11%
Dev Err 10% 10% 12% 20%
Comments Data mismatch High variance High Bias High Bi
Data m

What to do in case of high difference between training & dev set data?
1. Collect more training data similar to the dev set data
2. Artificial Data Synthesis (example data augmentation, appending
the rear view mirror problem)--- Just keep in mind that you shou
for a large possible examples.

Transfer Learning:
Learning to recognize cats, and then using that network to read x-ray
Using a model trained on some data (pre-training), and using that m
tuning).
Particularly useful when you have a relatively small dataset.

Multi-task learning:
Training multiple models and using all of them to train the final mode
n Training data & dev Data or because

ata and Dev data ?

om the dev set (Remove this from both)


ure of both the distributions.

ias,
mismatch

g car background noise in pure speech for


uld not synthesize the data as a tiny subset

y scans.
model to train some other data (Fine-

el
Multi-task learning:
Training multiple models and using all of them to train the final mode
Training models to detect cars, stop signs, pedestrians, etc. And using

End-to-end deep learning


For speech recognition we require,
Audio --> features -->Phonemes -->words --> transcript
What E2E DL does is:
Audio --------------------------------------------------> transcript
It requires a larger and deeper NN, and a lot more data.
el
g all of them to train your final model.

You might also like