15 Improving Performance - Hacks & Tricks
15 Improving Performance - Hacks & Tricks
Data Preparation
Data pre-processing techniques generally refer to the addition, deletion, or
transformation of training set data
Data augmentation in data analysis are techniques used to increase the amount of
data by adding slightly modified copies of already existing data or newly created
synthetic data from existing data.
Data Split
70% train, 15% val, 15% test. The function we want to minimize or maximize
80% train, 10% val, 10% test. is called the objective function or criterion.
60% train, 20% val, 20% test. When we are minimizing it, we may also call it
the cost function, loss function, or error
function.
Performance Metrics
70% train, 15% val, 15% test. The function we want to minimize or maximize
80% train, 10% val, 10% test. is called the objective function or criterion.
60% train, 20% val, 20% test. When we are minimizing it, we may also call it
the cost function, loss function, or error
function.
Loss
Learning Rate
Ideal Curves
Early Stopping
Validation
How many Neurons?
How many Neurons?
X W Y
.
.
.
Problem Complexity
.
.
.
How many Layers?
Big Network
ImageNet Challenge
Over-fitting
Over-fitting
Under-fitting
Good-fit
Big Network with Dropout
Regularization
W1 = 0
W2 = 0
W3 = 0
.
.
.
Which Activation Function?
Recall: Back Propagation
w1 w2
x f1 y1 f2 y2 . . . J(w)
.
.
.
Which Activation Function?
Data Scaling:
-1 to +1
Which Activation Function?
Data Scaling:
0 to 1
Weight Initialization?
Weight Initialization?
In general practice biases are initialized with 0 and weights are initialized with small numbers
drawn randomly from a Gaussian or uniform distribution in the range e.g. [0, 1] , [-1, 1], [-0.3, 0.3]
Weight Initialization?
Sigmoid / Tanh
In Xavier technique weights are initialized with small numbers drawn randomly from a uniform probability
distribution between the range -(1/sqrt(n)) and 1/sqrt(n), where n is the number of inputs to the neuron.
Weight Initialization?
ReLU
In He Normal technique weights are initialized with small numbers drawn randomly from a Gaussian probability
distribution with a mean 0.0 and a standard deviation of sqrt(2/n), where n is the number of inputs to the neuron.
Weight Initialization?
Each time, a neural network is initialized with a different set of weights, resulting in a different starting point, and
potentially resulting in a different final set of weights with different performance characteristics.
Which Loss Function?
Binary Cross Entropy will calculate a score that summarizes average difference between the actual and predicted
probability distributions for predicting class 1. The score is minimized and a perfect cross-entropy value is 0.
Which Loss Function?
.
.
.
Multi Cross Entropy will calculate a score that summarizes the average difference between the actual and predicted
probability distributions for all classes in the problem. The score is minimized and a perfect cross-entropy value is 0.
Which Loss Optimization Function?
Pre 0.8 Ac 1 J(W)
Pre 0.8 Ac 1 J(W)
Batch
Batch
1. Activation Functions
2. Network Topology
3. Batches and Epochs
4. Dropout
5. Optimization and Loss
6. Early Stopping
.
.
.
Transfer Learning
Transfer learning generally refers to a process where a model trained on one
problem is used in some way on a second related problem.